A long, long time ago
I have a few pictures on this blog, mostly in earlier years, because even with
small pictures, the git repository became 80MiB soon this is not much in
absolute terms, but the actual Markdown/Haskell/CSS/HTML total size is tiny
compared to the picture, PDFs and fonts. I realised I need a better solution,
probably about ten years ago, and that I should investigate
git-annex. Then time passed, and I heard
about git-lfs, so I thought that s the way forward.
Now, I recently got interested again into doing something about this repository,
and started researching.
Detour: git-lfs
I was sure that git-lfs, being supported by large providers, would be the
modern solution. But to my surprise, git-lfs is very server centric, which in
hindsight makes sense, but for a home setup, it s not very good. Maybe I
misunderstood, but git-lfs is more a protocol/method for a forge to store
files, rather than an end-user solution. But then you need to backup those files
separately (together with the rest of the forge), or implement another way of
safeguarding them.
Further details such as the fact that it keeps two copies of the files (one in
the actual checked-out tree, one in internal storage) means it s not a good
solution. Well, for my blog yes, but not in general. Then posts on Reddit about
horror stories people being locked out of github due to quota, as an example, or
this Stack Overflow
post
about git-lfs constraining how one uses git, convinced me that s not what I
want. To each their own, but not for me I might want to push this blog s repo to
github, but I definitely wouldn t want in that case to pay for github storage
for my blog images (which are copies, not originals). And yes, even in 2025,
those quotas are real GitHub
limits and
I agree with GitHub, storage and large bandwidth can t be free.
Back to the future: git-annex
So back to git-annex. I thought it s going to be a simple thing, but oh boy,
was I wrong. It took me half a week of continuous (well, in free time) reading
and discussions with LLMs to understand a bit how it works. I think, honestly,
it s a bit too complex, which is why the workflows
page lists seven (!) levels of
workflow complexity, from fully-managed, to fully-manual. IMHO, respect to the
author for the awesome tool, but if you need a web app to help you manage git,
it hints that the tool is too complex.
I made the mistake of running git annex sync once, to realise it actually
starts pushing to my upstream repo and creating new branches and whatnot, so
after enough reading, I settled on workflow 6/7, since I don t want another tool
to manage my git history. Maybe I m an outlier here, but everything automatic
is a bit too much for me.
Once you do managed yourself how git-annex works (on the surface, at least), it
is a pretty cool thing. It uses a git-annex git branch to store
metainformation, and that is relatively clean. If you do run git annex sync,
it creates some extra branches, which I don t like, but meh.
Trick question: what is a remote?
One of the most confusing things about git-annex was understanding its remote
concept. I thought a remote is a place where you replicate your data. But not,
that s a special remote. A normal remote is a git remote, but which is
expected to be git/ssh/with command line access. So if you have a git+ssh
remote, git-annex will not only try to push it s above-mentioned branch, but
also copy the files. If such a remote is on a forge that doesn t support
git-annex, then it will complain and get confused.
Of course, if you read the extensive docs, you just do git config remote.<name>.annex-ignore true, and it will understand that it should not
sync to it.
But, aside, from this case, git-annex expects that all checkouts and clones of
the repository are both metadata and data. And if you do any annex commands in
them, all other clones will know about them! This can be unexpected, and you
find people complaining about it, but nowadays there s a solution:
git clone dir && cd dir
git config annex.private true
git annex init "temp copy"
This is important. Any leaf git clone must be followed by that annex.private true config, especially on CI/CD machines. Honestly, I don t understand why
by default clones should be official data stores, but it is what it is.
I settled on not making any of my checkouts stable , but only the actual
storage places. Except those are not git repositories, but just git-annex
storage things. I.e., special remotes.
Is it confusing enough yet ?
Special remotes
The special remotes, as said, is what I expected to be the normal git annex
remotes, i.e. places where the data is stored. But well, they exist, and while
I m only using a couple simple ones, there is a large number of
them. Among the interesting
ones: git-lfs, a
remote that allows also storing the git repository itself
(git-remote-annex),
although I m bit confused about this one, and most of the common storage
providers via the rclone
remote.
Plus, all of the special remotes support encryption, so this is a really neat
way to store your files across a large number of things, and handle replication,
number of copies, from which copy to retrieve, etc. as you with.
And many of other features
git-annex has tons of other features, so to some extent, the sky s the limit.
Automatic selection of what to add git it vs plain git, encryption handling,
number of copies, clusters, computed files, etc. etc. etc. I still think it s
cool but too complex, though!
Uses
Aside from my blog post, of course.
I ve seen blog posts/comments about people using git-annex to track/store their
photo collection, and I could see very well how the remote encrypted repos any
of the services supported by rclone could be an N+2 copy or so. For me, tracking
photos would be a bit too tedious, but it could maybe work after more research.
A more practical thing would probably be replicating my local movie collection
(all legal, to be clear) better than just run rsync from time to time and
tracking the large files in it via git-annex. That s an exercise for another
day, though, once I get more mileage with it - my blog pictures are copies, so I
don t care much if they get lost, but movies are primary online copies, and I
don t want to re-dump the discs. Anyway, for later.
Migrating to git-annex
Migrating here means ending in a state where all large files are in git-annex,
and the plain git repo is small. Just moving the files to git annex at the
current head doesn t remove them from history, so your git repository is still
large; it won t grow in the future, but remains with old size (and contains the
large files in its history).
In my mind, a nice migration would be: run a custom command, and all the history
is migrated to git-annex, so I can go back in time and the still use git-annex.
I na vely expected this would be easy and already available, only to find
comments on the git-annex site with unsure git-filter-branch calls and some
web discussions. This is the
discussion
on the git annex website, but it didn t make me confident it would do the right
thing.
But that discussion is now 8 years old. Surely in 2025, with git-filter-repo,
it s easier? And, maybe I m missing something, but it is not. Not from the point
of view of plain git, that s easy, but because interacting with git-annex, which
stores its data in git itself, so doing this properly across successive steps of
a repo (when replaying the commits) is, I think, not well defined behaviour.
So I was stuck here for a few days, until I got an epiphany: As I m going to
rewrite the repository, of course I m keeping a copy of it from before
git-annex. If so, I don t need the history, back in time, to be correct in the
sense of being able to retrieve the binary files too. It just needs to be
correct from the point of view of the actual Markdown and Haskell files that
represent the meat of the blog.
This simplified the problem a lot. At first, I wanted to just skip these files,
but this could also drop commits (git-filter-repo, by default, drops the commits
if they re empty), and removing the files loses information - when they were
added, what were the paths, etc. So instead I came up with a rather clever idea,
if I might say so: since git-annex replaces files with symlinks already, just
replace the files with symlinks in the whole history, except symlinks that
are dangling (to represent the fact that files are missing). One could also use
empty files, but empty files are more valid in a sense than dangling symlinks,
hence why I settled on those.
Doing this with git-filter-repo is easy, in newer versions, with the
new --file-info-callback. Here is the simple code I used:
This goes and replaces files with a symlink to nowhere, but the symlink should
explain why it s dangling. Then later renames or moving the files around work
naturally , as the rename/mv doesn t care about file contents. Then, when the
filtering is done via:
copy the (binary) files from the original repository
since they re named the same, and in the same places, git sees a type change
then simply run git annex add on those files
For me it was easy as all such files were in a few directories, so just copying
those directories back, a few git-annex add commands, and done.
Of course, then adding a few rsync remotes, git annex copy --to, and the
repository was ready.
Well, I also found a bug in my own Hakyll setup: on a fresh clone, when the
large files are just dangling symlinks, the builder doesn t complain, just
ignores the images. Will have to fix.
Other resources
This is a blog that I read at the beginning, and I found it very useful as an
intro: https://switowski.com/blog/git-annex/. It didn t help me understand how
it works under the covers, but it is well written. The author does use the
sync command though, which is too magic for me, but also agrees about its
complexity
The proof is in the pudding
And now, for the actual first image to be added that never lived in the old
plain git repository. It s not full-res/full-size, it s cropped a bit on the
bottom.
Earlier in the year, I went to Paris for a very brief work trip, and I walked
around a bit it was more beautiful than what I remembered from way way back. So
a bit random selection of a picture, but here it is:
Un bateau sur la Seine
Enjoy!
Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website.
In this report:
Security audit of Reproducible Builds tools published
The Open Technology Fund s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a whitebox audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service.
The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility.
OTF s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.
[Colleagues] approached me to talk about a reproducibility issue they d been having with some R code. They d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using set.seed() to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren t just a little bit different in the way that we ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare attestable builds with reproducible builds by noting an attestable build requires only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it , and proceed by determining that t he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.
Timo Pohl, Pavel Nov k, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is sparse , focusing on few individual problems and ecosystems, and therefore identify space for more critical research.
Distribution work
In Debian this month:
Ian Jackson filed a bug against the debian-policy package in order to delve into an issue affecting Debian s support for cross-architecture compilation, multiple-architecture systems, reproducible builds SOURCE_DATE_EPOCH environment variable and the ability to recompile already-uploaded packages to Debian with a new/updated toolchain (binNMUs). Ian identifies a specific case, specifically in the libopts25-dev package, involving a manual page that had interesting downstream effects, potentially affecting backup systems. The bug generated a large number of replies, some of which have references to similar or overlapping issues, such as this one from 2016/2017.
There is now a Reproducibility Status link for each app on f-droid.org, listed on every app s page. Our verification server shows or based on its build results, where means our rebuilder reproduced the same APK file and means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.
diffoscope & disorderfsdiffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295, 296 and 297 to Debian:
Don t rely on zipdetails --walk argument being available, and only add that argument on newer versions after we test for that. []
Review and merge support for NuGet packages from Omair Majid. []
Update copyright years. []
Merge support for an lzma comparator from Will Hollywood. [][]
Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues []. This was then uploaded to Debian as version 0.6.0-1.
Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 [][] and 297 [][], and disorderfs to version 0.6.0 [][].
Website updates
Once again, there were a number of improvements made to our website this month including:
Chris Lamb:
Merged four or five suggestions from Guillem Jover for the GNU Autotools examples on the SOURCE_DATE_EPOCH example page []
Incorporated a number of fixes for the JavaScript SOURCE_DATE_EPOCH snippet from Sebastian Davis, which did not handle non-integer values correctly. []
Remove the JavaScript example that uses a fixed timezone on the SOURCE_DATE_EPOCH page. []
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.
However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL s public post, recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year . As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:
Migrating the central jenkins.debian.net server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.
After testing it for almost ten years, the i386 architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386 is no longer supported as a regular architecture there will be no official kernel and no Debian installer for i386 systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386 to amd64.
Another node, ionos17-amd64.debian.net, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.
Lastly, we have been granted access to more riscv64 architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.
Outside of this, a number of smaller changes were also made by Holger Levsen:
Disable testing of the i386 architecture. [][][][][]
Document the current disk usage. [][]
Address some image placement now that we only test three architectures. []
Keep track of build performance. []
Misc:
Fix a (harmless) typo in the multiarch_versionskew script. []
In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
Add out of memory detection to the statistics page. []
Reverse the sorting order on the statistics page. [][][][]
Improve the spacing between statistics groups. []
Update a (hard-coded) line number in error message detection pertaining to a debrebuild line number. []
Support Debian unstable in the rebuilder-debian.sh script. []]
Rely on rebuildctl to sync only arch-specific packages. [][]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
0xFFFF: Use SOURCE_DATE_EPOCH for date in manual pages.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Have you ever found yourself in the situation where you had no or
anonymized logs and still wanted to figure out where your traffic was
coming from?
Or you have multiple upstreams and are looking to see if you can save
fees by getting into peering agreements with some other party?
Or your site is getting heavy load but you can't pinpoint it on a
single IP and you suspect some amoral corporation is training their
degenerate AI on your content with a bot army?
(You might be getting onto something there.)
If that rings a bell, read on.
TL;DR:
... or just skip the cruft and install asncounter:
pip install asncounter
Also available in Debian 14 or later, or possibly in Debian 13
backports (soon to be released) if people are interested:
tcpdump -q -i eth0 -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)" asncounter --input-format=tcpdump --repl
Read on for why this matters, and why I wrote yet another weird tool
(almost) from scratch.
Background and manual work
This is a tool I've been dreaming of for a long, long time. Back in
2006, at Koumbit a colleague had setup TAS ("Traffic
Accounting System", " " in Russian, apparently), a
collection of Perl script that would do per-IP accounting. It was
pretty cool: it would count bytes per IP addresses and, from that, you
could do analysis. But the project died, and it was kind of bespoke.
Fast forward twenty years, and I find myself fighting off bots at the
Tor Project (the irony...), with our GitLab suffering pretty bad
slowdowns (see issue tpo/tpa/team#41677 for the latest public
issue, the juicier one is confidential, unfortunately).
(We did have some issues caused by overloads in CI, as we host, after
all, a fork of Firefox, which is a massive repository, but the
applications team did sustained, awesome work to fix issues on that
side, again and again (see tpo/applications/tor-browser#43121 for
the latest, and tpo/applications/tor-browser#43121 for some
pretty impressive correlation work, I work with really skilled
people). But those issues, I believe were fixed.)
So I had the feeling it was our turn to get hammered by the AI
bots. But how do we tell? I could tell something was hammering at
the costly /commit/ and (especially costly) /blame/ endpoint. So
at first, I pulled out the trusted awk, sort uniq -c sort -n
tail pipeline I am sure others have worked out before:
For people new to this, that pulls the first field out of web server
log files, sort the list, counts the number of unique entries, and
sorts that so that the most common entries (or IPs) show up first,
then show the top 10.
That, other words, answers the question of "which IP address visits
this web server the most?" Based on this, I found a couple of IP
addresses that looked like Alibaba. I had already addressed an abuse
complaint to them (tpo/tpa/team#42152) but never got a response,
so I just blocked their entire network blocks, rather violently:
for cidr in 47.240.0.0/14 47.246.0.0/16 47.244.0.0/15 47.235.0.0/16 47.236.0.0/14; do
iptables-legacy -I INPUT -s $cidr -j REJECT
done
That made Ali Baba and his forty thieves (specifically their
AL-3 network go away, but our load was still high, and I was
still seeing various IPs crawling the costly endpoints. And this time,
it was hard to tell who they were: you'll notice all the Alibaba IPs
are inside the same 47.0.0.0/8 prefix. Although it's not a /8
itself, it's all inside the same prefix, so it's visually easy to
pick it apart, especially for a brain like mine who's stared too long
at logs flowing by too fast for their own mental health.
What I had then was different, and I was tired of doing the stupid
thing I had been doing for decades at this point. I had recently
stumbled upon pyasn recently (in January, according to my notes)
and somehow found it again, and thought "I bet I could write a quick
script that loops over IPs and counts IPs per ASN".
(Obviously, there are lots of other tools out there for that kind of
monitoring. Argos, for example, presumably does this, but it's a kind
of a huge stack. You can also get into netflows, but there's serious
privacy implications with those. There are also lots of per-IP
counters like promacct, but that doesn't scale.
Or maybe someone already had solved this problem and I just wasted a
week of my life, who knows. Someone will let me know, I hope, either
way.)
ASNs and networks
A quick aside, for people not familiar with how the internet
works. People that know about ASNs, BGP announcements and so on can
skip.
The internet is the network of networks. It's made of multiple
networks that talk to each other. The way this works is there is a
Border Gateway Protocol (BGP), a relatively simple TCP-based protocol,
that the edge routers of those networks used to announce each other
what network they manage. Each of those network is called an
Autonomous System (AS) and has an AS number (ASN) to uniquely identify
it. Just like IP addresses, ASNs are allocated by IANA and local
registries, they're pretty cheap and useful if you like running your
own routers, get one.
When you have an ASN, you'll use it to, say, announce to your BGP
neighbors "I have 198.51.100.0/24" over here and the others might
say "okay, and I have 216.90.108.31/19 over here, and I know of this
other ASN over there that has 192.0.2.1/24 too! And gradually, those
announcements flood the entire network, and you end up with each BGP
having a routing table of the global internet, with a map of which
network block, or "prefix" is announced by which ASN.
It's how the internet works, and it's a useful thing to know, because
it's what, ultimately, makes an organisation responsible for an IP
address. There are "looking glass" tools like the one provided by
routeviews.org which allow you to effectively run "trace routes"
(but not the same as traceroute, which actively sends probes from
your location), type an IP address in that form to fiddle with it. You
will end up with an "AS path", the way to get from the looking glass
to the announced network. But I digress, and that's kind of out of
scope.
Point is, internet is made of networks, networks are autonomous
systems (AS) and they have numbers (ASNs), and they announced IP
prefixes (or "network blocks") that ultimately tells you who is
responsible for traffic on the internet.
Introducing asncounter
So my goal was to get from "lots of IP addresses" to "list of ASNs",
possibly also the list of prefixes (because why not). Turns out pyasn
makes that really easy. I managed to build a prototype in probably
less than an hour, just look at the first version, it's 44 lines
(sloccount) of Python, and it works, provided you have already
downloaded the required datafiles from routeviews.org. (Obviously, the
latest version is longer at close to 1000 lines, but it downloads the
data files automatically, and has many more features).
The way the first prototype (and later versions too, mostly) worked is
that you feed it a list of IP addresses on standard input, it looks up
the ASN and prefix associated with the IP, and increments a counter
for those, then print the result.
That showed me something like this:
root@gitlab-02:~/anarcat-scripts# tcpdump -q -i eth0 -n -Q in "(udp or tcp)" ./asncounter.py --tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
INFO: collecting IPs from stdin, using datfile ipasn_20250523.1600.dat.gz
INFO: loading datfile /root/.cache/pyasn/ipasn_20250523.1600.dat.gz...
INFO: loading /root/.cache/pyasn/asnames.json
ASN count AS
136907 7811 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
[----] 359 [REDACTED]
[----] 313 [REDACTED]
8075 254 MICROSOFT-CORP-MSN-AS-BLOCK, US
[---] 164 [REDACTED]
[----] 136 [REDACTED]
24940 114 HETZNER-AS, DE
[----] 98 [REDACTED]
14618 82 AMAZON-AES, US
[----] 79 [REDACTED]
prefix count
166.108.192.0/20 1294
188.239.32.0/20 1056
166.108.224.0/20 970
111.119.192.0/20 951
124.243.128.0/18 667
94.74.80.0/20 651
111.119.224.0/20 622
111.119.240.0/20 566
111.119.208.0/20 538
[REDACTED] 313
Even without ratios and a total count (which will come later), it was
quite clear that Huawei was doing something big on the server. At that
point, it was responsible for a quarter to half of the traffic on our
GitLab server or about 5-10 queries per second.
But just looking at the logs, or per IP hit counts, it was really hard
to tell. That traffic is really well distributed. If you look more
closely at the output above, you'll notice I redacted a couple of
entries except major providers, for privacy reasons. But you'll also
notice almost nothing is redacted in the prefix list, why? Because
all of those networks are Huawei! Their announcements are kind of
bonkers: they have hundreds of such prefixes.
Now, clever people in the know will say "of course they do, it's an
hyperscaler; just ASN14618 (AMAZON-AES) there is way more
announcements, they have 1416 prefixes!" Yes, of course, but they are
not generating half of my traffic (at least, not yet). But even then:
this also applies to Amazon! This way of counting traffic is way
more useful for large scale operations like this, because you group by
organisation instead of by server or individual endpoint.
And, ultimately, this is why asncounter matters: it allows you to
group your traffic by organisation, the place you can actually
negotiate with.
Now, of course, that assumes those are entities you can talk with. I
have written to both Alibaba and Huawei, and have yet to receive a
response. I assume I never will. In their defence, I wrote in English,
perhaps I should have made the effort of translating my message in
Chinese, but then again English is the Lingua Franca of the
Internet, and I doubt that's actually the issue.
The Huawei and Facebook blocks
Another aside, because this is my blog and I am not looking for a
Pullitzer here.
So I blocked Huawei from our GitLab server (and before you tear your
shirt open: only our GitLab server, everything else is still
accessible to them, including our email server to respond to my
complaint). I did so 24h after emailing them, and after examining
their user agent (UA) headers. Boy that was fun. In a sample of 268
requests I analyzed, they churned out 246 different UAs.
At first glance, they looked legit, like:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
Safari on a Mac, so far so good. But when you start digging, you
notice some strange things, like here's Safari running on Linux:
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.457.0 Safari/534.3
Was Safari ported to Linux? I guess that's.. possible?
But here is Safari running on a 15 year old Ubuntu release (10.10):
Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Ubuntu/10.10 Chromium/12.0.702.0 Chrome/12.0.702.0 Safari/534.24
Speaking of old, here's Safari again, but this time running on Windows
NT 5.1, AKA Windows XP, released 2001, EOL since 2019:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-CA) AppleWebKit/534.13 (KHTML like Gecko) Chrome/9.0.597.98 Safari/534.13
Really?
Here's Firefox 3.6, released 14 years ago, there were quite a lot of
those:
Mozilla/5.0 (Windows; U; Windows NT 6.1; lt; rv:1.9.2) Gecko/20100115 Firefox/3.6
I remember running those old Firefox releases, those were the days.
But to me, those look like entirely fake UAs, deliberately rotated to
make it look like legitimate traffic.
In comparison, Facebook seemed a bit more legit, in the sense that
they don't fake it. most hits are from:
crawls the web for use cases such as training AI models or improving products by indexing content directly
From what I could tell, it was even respecting our rather liberal
robots.txt rules, in that it wasn't crawling the sprawling /blame/
or /commit/ endpoints, explicitly forbidden by robots.txt.
So I've blocked the Facebook bot in robots.txt and, amazingly, it
just went away. Good job Facebook, as much as I think you've given the
empire to neo-nazis, cause depression and genocide, you know how to
run a crawler, thanks.
Huawei was blocked at the web server level, with a friendly 429 status
code telling people to contact us (over email) if they need help. And
they don't care: they're still hammering the server, from what I can
tell, but then again, I didn't block the entire ASN just yet, just the
blocks I found crawling the server over a couple hours.
A full asncounter run
So what does a day in asncounter look like? Well, you start with a
problem, say you're getting too much traffic and want to see where
it's from. First you need to sample it. Typically, you'd do that with
tcpdump or tailing a log file:
If you really get a lot of traffic, you might want to get a subset
of that to avoid overwhelming asncounter, it's not fast enough to do
multiple gigabit/second, I bet, so here's only incoming SYN IPv4
packets:
tcpdump -q -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)" asncounter --input-format=tcpdump --repl
In any case, at this point you're staring at a process, just sitting
there. If you passed the --repl or --manhole arguments, you're
lucky: you have a Python shell inside the program. Otherwise, send
SIGHUP to the thing to have it dump the nice tables out:
pkill -HUP asncounter
Here's an example run:
> awk ' print $2 ' /var/log/apache2/*access*.log asncounter
INFO: using datfile ipasn_20250527.1600.dat.gz
INFO: collecting addresses from <stdin>
INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
INFO: finished reading data
INFO: loading /home/anarcat/.cache/pyasn/asnames.json
count percent ASN AS
12779 69.33 66496 SAMPLE, CA
3361 18.23 None None
366 1.99 66497 EXAMPLE, FR
337 1.83 16276 OVH, FR
321 1.74 8075 MICROSOFT-CORP-MSN-AS-BLOCK, US
309 1.68 14061 DIGITALOCEAN-ASN, US
128 0.69 16509 AMAZON-02, US
77 0.42 48090 DMZHOST, GB
56 0.3 136907 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
53 0.29 17621 CNCGROUP-SH China Unicom Shanghai network, CN
total: 18433
count percent prefix ASN AS
12779 69.33 192.0.2.0/24 66496 SAMPLE, CA
3361 18.23 None
298 1.62 178.128.208.0/20 14061 DIGITALOCEAN-ASN, US
289 1.57 51.222.0.0/16 16276 OVH, FR
272 1.48 2001:DB8::/48 66497 EXAMPLE, FR
235 1.27 172.160.0.0/11 8075 MICROSOFT-CORP-MSN-AS-BLOCK, US
94 0.51 2001:DB8:1::/48 66497 EXAMPLE, FR
72 0.39 47.128.0.0/14 16509 AMAZON-02, US
69 0.37 93.123.109.0/24 48090 DMZHOST, GB
53 0.29 27.115.124.0/24 17621 CNCGROUP-SH China Unicom Shanghai network, CN
Those numbers are actually from my home network, not GitLab. Over
there, the battle still rages on, but at least the vampire bots are
banging their heads against the solid Nginx wall instead of eating the
fragile heart of GitLab. We had a significant improvement in latency
thanks to the Facebook and Huawei blocks... Here are the "workhorse
request duration stats" for various time ranges, 20h after the block:
range
mean
max
stdev
20h
449ms
958ms
39ms
7d
1.78s
5m
14.9s
30d
2.08s
3.86m
8.86s
6m
901ms
27.3s
2.43s
We went from two seconds mean to 500ms! And look at that standard deviation!
39ms! It was ten seconds before! I doubt we'll keep it that way very
long but for now, it feels like I won a battle, and I didn't even have
to setup anubis or go-away, although I suspect that will
unfortunately come.
Note that asncounter also supports exporting Prometheus metrics, but
you should be careful with this, as it can lead to cardinal explosion,
especially if you track by prefix (which can be disabled with
--no-prefixes .
Folks interested in more details should read the fine manual for
more examples, usage, and discussion. It shows, among other things,
how to effectively block lots of networks from Nginx, aggregate
multiple prefixes, block entire ASNs, and more!
So there you have it: I now have the tool I wish I had 20 years
ago. Hopefully it will stay useful for another 20 years, although I'm
not sure we'll have still have internet in 20
years.
I welcome constructive feedback, "oh no you rewrote X", Grafana
dashboards, bug reports, pull requests, and "hell yeah"
comments. Hacker News, let it rip, I know you can give me another
juicy quote for my blog.
This work was done as part of my paid work for the Tor Project,
currently in a fundraising drive, give us money if you like what you
read.
I ve been part of the Debian Project since 2019, when I attended DebConf held in Curitiba, Brazil. That event sparked my interest in the community, packaging, and how Debian works as a distribution.
In the early years of my involvement, I contributed to various teams such as the Python, Golang and Cloud teams, packaging dependencies and maintaining various tools. However, I soon felt the need to focus on packaging software I truly enjoyed, tools I was passionate about using and maintaining.
That s when I turned my attention to Kubernetes within Debian.
A Broken Ecosystem
The Kubernetes packaging situation in Debian had been problematic for some time. Given its large codebase and complex dependency tree, the initial packaging approach involved vendorizing all dependencies. While this allowed a somewhat functional package to be published, it introduced several long-term issues, especially security concerns.
Vendorized packages bundle third-party dependencies directly into the source tarball. When vulnerabilities arise in those dependencies, it becomes difficult for Debian s security team to patch and rebuild affected packages system-wide. This approach broke Debian s best practices, and it eventually led to the abandonment of the Kubernetes source package, which had stalled at version 1.20.5.
Due to this abandonment, critical bugs emerged and the package was removed from Debian s testing channel, as we can see in the package tracker.
New Debian Kubernetes Team
Around this time, I became a Debian Maintainer (DM), with permissions to upload certain packages. I saw an opportunity to both contribute more deeply to Debian and to fix Kubernetes packaging.
In early 2024, just before DebConf Busan in South Korea, I founded the Debian Kubernetes Team. The mission of the team was to repackage Kubernetes in a maintainable, security-conscious, and Debian-compliant way. At DebConf, I shared our progress with the broader community and received great feedback and more visibility, along with people interested in contributing to the team.
Our first tasks was to migrate existing Kubernetes-related tools such as kubectx, kubernetes-split-yaml and kubetail into a dedicated namespace on Salsa, Debian s GitLab instance.
Many of these tools were stored across different teams (like the Go team), and consolidating them helped us organize development and focus our efforts.
De-vendorizing Kubernetes
Our main goal was to un-vendorize Kubernetes and bring it up-to-date with upstream releases.
This meant:
Removing the vendor directory and all embedded third-party code.
Trimming the build scope to focus solely on building kubectl, Kubernetes CLI.
Using Files-Excluded in debian/copyright to cleanly drop unneeded files during source imports.
Rebuilding the dependency tree, ensuring all Go modules were separately packaged in Debian.
We used uscan, a standard Debian packaging tool that fetches upstream tarballs and prepares them accordingly. The Files-Excluded directive in our debian/copyright file instructed uscan to automatically remove unnecessary files during the repackaging process:
$ uscan
Newest version of kubernetes on remote site is 1.32.3, specified download version is 1.32.3
Successfully repacked ../v1.32.3 as ../kubernetes_1.32.3+ds.orig.tar.gz, deleting 30616 files from it.
The results were dramatic. By comparing the original upstream tarball with our repackaged version, we can see that our approach reduced the tarball size by over 75%:
This significant reduction wasn t just about saving space. By removing over 30,000 files, we simplified the package, making it more maintainable. Each dependency could now be properly tracked, updated, and patched independently, resolving the security concerns that had plagued the previous packaging approach.
Dependency Graph
To give you an idea of the complexity involved in packaging Kubernetes for Debian, the image below is a dependency graph generated with debtree, visualizing all the Go modules and other dependencies required to build the kubectl binary.
This web of nodes and edges represents every module and its relationship during the compilation process of kubectl. Each box is a Debian package, and the lines connecting them show how deeply intertwined the ecosystem is. What might look like a mess of blue spaghetti is actually a clear demonstration of the vast and interconnected upstream world that tools like kubectl rely on.
But more importantly, this graph is a testament to the effort that went into making kubectl build entirely using Debian-packaged dependencies only, no vendoring, no downloading from the internet, no proprietary blobs.
Upstream Version 1.32.3 and Beyond
After nearly two years of work, we successfully uploaded version 1.32.3+ds of kubectl to Debian unstable.
kubernetes/-/merge_requests/1
Closed over a dozen long-standing bugs, including:
Zsh, Fish, and Bash completions installed automatically
Man pages and metadata for improved discoverability
Full integration with kind and docker for testing purposes
Integration Testing with Autopkgtest
To ensure the reliability of kubectl in real-world scenarios, we developed a new autopkgtest suite that runs integration tests using real Kubernetes clusters created via Kind.
Autopkgtest is a Debian tool used to run automated tests on binary packages. These tests are executed after the package is built but before it s accepted into the Debian archive, helping catch regressions and integration issues early in the packaging pipeline.
Our test workflow validates kubectl by performing the following steps:
Installing Kind and Docker as test dependencies.
Spinning up two local Kubernetes clusters.
Switching between cluster contexts to ensure multi-cluster support.
Deploying and scaling a sample nginx application using kubectl.
Cleaning up the entire test environment to avoid side effects.
Popcon: Measuring Adoption
To measure real-world usage, we rely on data from Debian s popularity contest (popcon), which gives insight into how many users have each binary installed.
Here s what the data tells us:
kubectl (new binary): Already installed on 2,124 systems.
golang-k8s-kubectl-dev: This is the Go development package (a library), useful for other packages and developers who want to interact with Kubernetes programmatically.
kubernetes-client: The legacy package that kubectl is replacing. We expect this number to decrease in future releases as more systems transition to the new package.
Although the popcon data shows activity for kubectl before the official Debian upload date, it s important to note that those numbers represent users who had it installed from upstream source-lists, not from the Debian repositories. This distinction underscores a demand that existed even before the package was available in Debian proper, and it validates the importance of bringing it into the archive.
Also worth mentioning: this number is not the real total number of installations, since users can choose not to participate in the popularity contest. So the actual adoption is likely higher than what popcon reflects.
Community and Documentation
The team also maintains a dedicated wiki page which documents:
Looking Ahead to Debian 13 (Trixie)
The next stable release of Debian will ship with kubectl version 1.32.3, built from a clean, de-vendorized source. This version includes nearly all the latest upstream features, and will be the first time in years that Debian users can rely on an up-to-date, policy-compliant kubectl directly from the archive.
By comparing with upstream, our Debian package even delivers more out of the box, including shell completions, which the upstream still requires users to generate manually.
In 2025, the Debian Kubernetes team will continue expanding our packaging efforts for the Kubernetes ecosystem.
Our roadmap includes:
kubelet: The primary node agent that runs on each node. This will enable Debian users to create fully functional Kubernetes nodes without relying on external packages.
kubeadm: A tool for creating Kubernetes clusters. With kubeadm in Debian, users will then be able to bootstrap minimum viable clusters directly from the official repositories.
helm: The package manager for Kubernetes that helps manage applications through Kubernetes YAML files defined as charts.
kompose: A conversion tool that helps users familiar with docker-compose move to Kubernetes by translating Docker Compose files into Kubernetes resources.
Final Thoughts
This journey was only possible thanks to the amazing support of the debian-devel-br community and the collective effort of contributors who stepped up to package missing dependencies, fix bugs, and test new versions.
Special thanks to:
Carlos Henrique Melara (@charles)
Guilherme Puida (@puida)
Jo o Pedro Nobrega (@jnpf)
Lucas Kanashiro (@kanashiro)
Matheus Polkorny (@polkorny)
Samuel Henrique (@samueloph)
Sergio Cipriano (@cipriano)
Sergio Durigan Junior (@sergiodj)
I look forward to continuing this work, bringing more Kubernetes tools into Debian and improving the developer experience for everyone.
I ve been part of the Debian Project since 2019, when I attended DebConf held in Curitiba, Brazil. That event sparked my interest in the community, packaging, and how Debian works as a distribution.
In the early years of my involvement, I contributed to various teams such as the Python, Golang and Cloud teams, packaging dependencies and maintaining various tools. However, I soon felt the need to focus on packaging software I truly enjoyed, tools I was passionate about using and maintaining.
That s when I turned my attention to Kubernetes within Debian.
A Broken Ecosystem
The Kubernetes packaging situation in Debian had been problematic for some time. Given its large codebase and complex dependency tree, the initial packaging approach involved vendorizing all dependencies. While this allowed a somewhat functional package to be published, it introduced several long-term issues, especially security concerns.
Vendorized packages bundle third-party dependencies directly into the source tarball. When vulnerabilities arise in those dependencies, it becomes difficult for Debian s security team to patch and rebuild affected packages system-wide. This approach broke Debian s best practices, and it eventually led to the abandonment of the Kubernetes source package, which had stalled at version 1.20.5.
Due to this abandonment, critical bugs emerged and the package was removed from Debian s testing channel, as we can see in the package tracker.
New Debian Kubernetes Team
Around this time, I became a Debian Maintainer (DM), with permissions to upload certain packages. I saw an opportunity to both contribute more deeply to Debian and to fix Kubernetes packaging.
In early 2024, just before DebConf Busan in South Korea, I founded the Debian Kubernetes Team. The mission of the team was to repackage Kubernetes in a maintainable, security-conscious, and Debian-compliant way. At DebConf, I shared our progress with the broader community and received great feedback and more visibility, along with people interested in contributing to the team.
Our first tasks was to migrate existing Kubernetes-related tools such as kubectx, kubernetes-split-yaml and kubetail into a dedicated namespace on Salsa, Debian s GitLab instance.
Many of these tools were stored across different teams (like the Go team), and consolidating them helped us organize development and focus our efforts.
De-vendorizing Kubernetes
Our main goal was to un-vendorize Kubernetes and bring it up-to-date with upstream releases.
This meant:
Removing the vendor directory and all embedded third-party code.
Trimming the build scope to focus solely on building kubectl, Kubernetes CLI.
Using Files-Excluded in debian/copyright to cleanly drop unneeded files during source imports.
Rebuilding the dependency tree, ensuring all Go modules were separately packaged in Debian.
We used uscan, a standard Debian packaging tool that fetches upstream tarballs and prepares them accordingly. The Files-Excluded directive in our debian/copyright file instructed uscan to automatically remove unnecessary files during the repackaging process:
$ uscan
Newest version of kubernetes on remote site is 1.32.3, specified download version is 1.32.3
Successfully repacked ../v1.32.3 as ../kubernetes_1.32.3+ds.orig.tar.gz, deleting 30616 files from it.
The results were dramatic. By comparing the original upstream tarball with our repackaged version, we can see that our approach reduced the tarball size by over 75%:
This significant reduction wasn t just about saving space. By removing over 30,000 files, we simplified the package, making it more maintainable. Each dependency could now be properly tracked, updated, and patched independently, resolving the security concerns that had plagued the previous packaging approach.
Dependency Graph
To give you an idea of the complexity involved in packaging Kubernetes for Debian, the image below is a dependency graph generated with debtree, visualizing all the Go modules and other dependencies required to build the kubectl binary.
This web of nodes and edges represents every module and its relationship during the compilation process of kubectl. Each box is a Debian package, and the lines connecting them show how deeply intertwined the ecosystem is. What might look like a mess of blue spaghetti is actually a clear demonstration of the vast and interconnected upstream world that tools like kubectl rely on.
But more importantly, this graph is a testament to the effort that went into making kubectl build entirely using Debian-packaged dependencies only, no vendoring, no downloading from the internet, no proprietary blobs.
Upstream Version 1.32.3 and Beyond
After nearly two years of work, we successfully uploaded version 1.32.3+ds of kubectl to Debian unstable.
kubernetes/-/merge_requests/1
Closed over a dozen long-standing bugs, including:
Zsh, Fish, and Bash completions installed automatically
Man pages and metadata for improved discoverability
Full integration with kind and docker for testing purposes
Integration Testing with Autopkgtest
To ensure the reliability of kubectl in real-world scenarios, we developed a new autopkgtest suite that runs integration tests using real Kubernetes clusters created via Kind.
Autopkgtest is a Debian tool used to run automated tests on binary packages. These tests are executed after the package is built but before it s accepted into the Debian archive, helping catch regressions and integration issues early in the packaging pipeline.
Our test workflow validates kubectl by performing the following steps:
Installing Kind and Docker as test dependencies.
Spinning up two local Kubernetes clusters.
Switching between cluster contexts to ensure multi-cluster support.
Deploying and scaling a sample nginx application using kubectl.
Cleaning up the entire test environment to avoid side effects.
Popcon: Measuring Adoption
To measure real-world usage, we rely on data from Debian s popularity contest (popcon), which gives insight into how many users have each binary installed.
Here s what the data tells us:
kubectl (new binary): Already installed on 2,124 systems.
golang-k8s-kubectl-dev: This is the Go development package (a library), useful for other packages and developers who want to interact with Kubernetes programmatically.
kubernetes-client: The legacy package that kubectl is replacing. We expect this number to decrease in future releases as more systems transition to the new package.
Although the popcon data shows activity for kubectl before the official Debian upload date, it s important to note that those numbers represent users who had it installed from upstream source-lists, not from the Debian repositories. This distinction underscores a demand that existed even before the package was available in Debian proper, and it validates the importance of bringing it into the archive.
Also worth mentioning: this number is not the real total number of installations, since users can choose not to participate in the popularity contest. So the actual adoption is likely higher than what popcon reflects.
Community and Documentation
The team also maintains a dedicated wiki page which documents:
Looking Ahead to Debian 13 (Trixie)
The next stable release of Debian will ship with kubectl version 1.32.3, built from a clean, de-vendorized source. This version includes nearly all the latest upstream features, and will be the first time in years that Debian users can rely on an up-to-date, policy-compliant kubectl directly from the archive.
By comparing with upstream, our Debian package even delivers more out of the box, including shell completions, which the upstream still requires users to generate manually.
In 2025, the Debian Kubernetes team will continue expanding our packaging efforts for the Kubernetes ecosystem.
Our roadmap includes:
kubelet: The primary node agent that runs on each node. This will enable Debian users to create fully functional Kubernetes nodes without relying on external packages.
kubeadm: A tool for creating Kubernetes clusters. With kubeadm in Debian, users will then be able to bootstrap minimum viable clusters directly from the official repositories.
helm: The package manager for Kubernetes that helps manage applications through Kubernetes YAML files defined as charts.
kompose: A conversion tool that helps users familiar with docker-compose move to Kubernetes by translating Docker Compose files into Kubernetes resources.
Final Thoughts
This journey was only possible thanks to the amazing support of the debian-devel-br community and the collective effort of contributors who stepped up to package missing dependencies, fix bugs, and test new versions.
Special thanks to:
Carlos Henrique Melara (@charles)
Guilherme Puida (@puida)
Jo o Pedro Nobrega (@jnpf)
Lucas Kanashiro (@kanashiro)
Matheus Polkorny (@polkorny)
Samuel Henrique (@samueloph)
Sergio Cipriano (@cipriano)
Sergio Durigan Junior (@sergiodj)
I look forward to continuing this work, bringing more Kubernetes tools into Debian and improving the developer experience for everyone.
I've been refreshing myself on the low-level guts of Linux
container technology. Here's some notes on mount namespaces.
In the below examples, I will use more than one root shell
simultaneously. To disambiguate them, the examples will feature
a numbered shell prompt: 1# for the first shell, and 2# for
the second.
Preliminaries
Namespaces are normally associated with processes and are
removed when the last associated process terminates. To make
them persistent, you have to bind-mount the corresponding
virtual file from an associated processes's entry in /proc,
to another path1.
The receiving path needs to have its "propogation" property set to "private".
Most likely your system's existing mounts are mostly "public". You can check
the propogation setting for mounts with
1# findmnt -o+PROPAGATION
We'll create a new directory to hold mount namespaces we create,
and set its Propagation to private, via a bind-mount of itself
to itself.
1# mkdir /root/mntns
1# mount --bind --make-private /root/mntns /root/mntns
The namespace itself needs to be bind-mounted over a file rather
than a directory, so we'll create one.
1# touch /root/mntns/1
Creating and persisting a new mount namespace
1# unshare --mount=/root/mntns/1
We are now 'inside' the new namespace in a new shell process.
We'll change the shell prompt to make this clearer
PS1='inside# '
We can make a filesystem change, such as mounting a tmpfs
inside# mount -t tmpfs /mnt /mnt
inside# touch /mnt/hi-there
And observe it is not visible outside that namespace
2# findmnt /mnt
2# stat /mnt/hi-there
stat: cannot statx '/mnt/hi-there': No such file or directory
Back to the namespace shell, we can find an integer identifier for
the namespace via the shell processes /proc entry:
inside# readlink /proc/$$/ns/mnt
It will be something like mnt:[4026533646].
From another shell, we can list namespaces and see that it
exists:
2# lsns -t mnt
NS TYPE NPROCS PID USER COMMAND
4026533646 mnt 1 52525 root -bash
If we exit the shell that unshare created,
inside# exit
running lsns again should2 still list the namespace,
albeit with the NPROCS column now reading 0.
2# lsns -t mnt
We can see that a virtual filesystem of type nsfs is mounted at
the path we selected when we ran unshare:
In this post, I demonstrate the optimal workflow for creating new Debian packages in 2025, preserving the upstream git history. The motivation for this is to lower the barrier for sharing improvements to and from upstream, and to improve software provenance and supply-chain security by making it easy to inspect every change at any level using standard git tooling.
Key elements of this workflow include:
Using a Git fork/clone of the upstream repository as the starting point for creating Debian packaging repositories.
Consistent use of the same git-buildpackage commands, with all package-specific options in gbp.conf.
Pristine-tar and upstream signatures for supply-chain security.
Use of Files-Excluded in the debian/copyright file to filter out unwanted files in Debian.
Patch queues to easily rebase and cherry-pick changes across Debian and upstream branches.
Efficient use of Salsa, Debian s GitLab instance, for both automated feedback from CI systems and human feedback from peer reviews.
To make the instructions so concrete that anyone can repeat all the steps themselves on a real package, I demonstrate the steps by packaging the command-line tool Entr. It is written in C, has very few dependencies, and its final Debian source package structure is simple, yet exemplifies all the important parts that go into a complete Debian package:
Creating a new packaging repository and publishing it under your personal namespace on salsa.debian.org.
Using dh_make to create the initial Debian packaging.
Posting the first draft of the Debian packaging as a Merge Request (MR) and using Salsa CI to verify Debian packaging quality.
Running local builds efficiently and iterating on the packaging process.
Create new Debian packaging repository from the existing upstream project git repository
First, create a new empty directory, then clone the upstream Git repository inside it:
Using a clean directory makes it easier to inspect the build artifacts of a Debian package, which will be output in the parent directory of the Debian source directory.
The extra parameters given to git clone lay the foundation for the Debian packaging git repository structure where the upstream git remote name is upstreamvcs. Only the upstream main branch is tracked to avoid cluttering git history with upstream development branches that are irrelevant for packaging in Debian.
Next, enter the git repository directory and list the git tags. Pick the latest upstream release tag as the commit to start the branch upstream/latest. This latest refers to the upstream release, not the upstream development branch. Immediately after, branch off the debian/latest branch, which will have the actual Debian packaging files in the debian/ subdirectory.
shellcd entr
git tag # shows the latest upstream release tag was '5.6'
git checkout -b upstream/latest 5.6
git checkout -b debian/latest
cd entr
git tag # shows the latest upstream release tag was '5.6'git checkout -b upstream/latest 5.6
git checkout -b debian/latest
At this point, the repository is structured according to DEP-14 conventions, ensuring a clear separation between upstream and Debian packaging changes, but there are no Debian changes yet. Next, add the Salsa repository as a new remote which called origin, the same as the default remote name in git.
This is an important preparation step to later be able to create a Merge Request on Salsa that targets the debian/latest branch, which does not yet have any debian/ directory.
Launch a Debian Sid (unstable) container to run builds in
To ensure that all packaging tools are of the latest versions, run everything inside a fresh Sid container. This has two benefits: you are guaranteed to have the most up-to-date toolchain, and your host system stays clean without getting polluted by various extra packages. Additionally, this approach works even if your host system is not Debian/Ubuntu.
cd ..
podman run --interactive --tty --rm --shm-size=1G --cap-add SYS_PTRACE \
--env='DEB*' --volume=$PWD:/tmp/test --workdir=/tmp/test debian:sid bash
Note that the container should be started from the parent directory of the git repository, not inside it. The --volume parameter will loop-mount the current directory inside the container. Thus all files created and modified are on the host system, and will persist after the container shuts down.
Once inside the container, install the basic dependencies:
Automate creating the debian/ files with dh-make
To create the files needed for the actual Debian packaging, use dh_make:
shell# dh_make --packagename entr_5.6 --single --createorig
Maintainer Name : Otto Kek l inen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]
Done. Please edit the files in the debian/ subdirectory now.
# dh_make --packagename entr_5.6 --single --createorigMaintainer Name : Otto Kek l inen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]Done. Please edit the files in the debian/ subdirectory now.
Due to how dh_make works, the package name and version need to be written as a single underscore separated string. In this case, you should choose --single to specify that the package type is a single binary package. Other options would be --library for library packages (see libgda5 sources as an example) or --indep (see dns-root-data sources as an example). The --createorig will create a mock upstream release tarball (entr_5.6.orig.tar.xz) from the current release directory, which is necessary due to historical reasons and how dh_make worked before git repositories became common and Debian source packages were based off upstream release tarballs (e.g. *.tar.gz).
At this stage, a debian/ directory has been created with template files, and you can start modifying the files and iterating towards actual working packaging.
shellgit add debian/
git commit -a -m "Initial Debian packaging"
git add debian/
git commit -a -m "Initial Debian packaging"
Review the files
The full list of files after the above steps with dh_make would be:
You can browse these files in the demo repository.
The mandatory files in the debian/ directory are:
changelog,
control,
copyright,
and rules.
All the other files have been created for convenience so the packager has template files to work from. The files with the suffix .ex are example files that won t have any effect until their content is adjusted and the suffix removed.
For detailed explanations of the purpose of each file in the debian/ subdirectory, see the following resources:
The Debian Policy Manual: Describes the structure of the operating system, the package archive and requirements for packages to be included in the Debian archive.
The Developer s Reference: A collection of best practices and process descriptions Debian packagers are expected to follow while interacting with one another.
Debhelper man pages: Detailed information of how the Debian package build system works, and how the contents of the various files in debian/ affect the end result.
As Entr, the package used in this example, is a real package that already exists in the Debian archive, you may want to browse the actual Debian packaging source at https://salsa.debian.org/debian/entr/-/tree/debian/latest/debian for reference.
Most of these files have standardized formatting conventions to make collaboration easier. To automatically format the files following the most popular conventions, simply run wrap-and-sort -vast or debputy reformat --style=black.
Identify build dependencies
The most common reason for builds to fail is missing dependencies. The easiest way to identify which Debian package ships the required dependency is using apt-file. If, for example, a build fails complaining that pcre2posix.h cannot be found or that libcre2-posix.so is missing, you can use these commands:
The output above implies that the debian/control should be extended to define a Build-Depends: libpcre2-dev relationship.
There is also dpkg-depcheck that uses strace to trace the files the build process tries to access, and lists what Debian packages those files belong to. Example usage:
shelldpkg-depcheck -b debian/rules build
dpkg-depcheck -b debian/rules build
Build the Debian sources to generate the .deb package
After the first pass of refining the contents of the files in debian/, test the build by running dpkg-buildpackage inside the container:
shelldpkg-buildpackage -uc -us -b
dpkg-buildpackage -uc -us -b
The options -uc -us will skip signing the resulting Debian source package and other build artifacts. The -b option will skip creating a source package and only build the (binary) *.deb packages.
The output is very verbose and gives a large amount of context about what is happening during the build to make debugging build failures easier. In the build log of entr you will see for example the line dh binary --buildsystem=makefile. This and other dh commands can also be run manually if there is a need to quickly repeat only a part of the build while debugging build failures.
To see what files were generated or modified by the build simply run git status --ignored:
shell$ git status --ignored
On branch debian/latest
Untracked files:
(use "git add <file>..." to include in what will be committed)
debian/debhelper-build-stamp
debian/entr.debhelper.log
debian/entr.substvars
debian/files
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
Makefile
compat.c
compat.o
debian/.debhelper/
debian/entr/
entr
entr.o
status.o
$ git status --ignored
On branch debian/latest
Untracked files:
(use "git add <file>..." to include in what will be committed) debian/debhelper-build-stamp
debian/entr.debhelper.log
debian/entr.substvars
debian/files
Ignored files:
(use "git add -f <file>..." to include in what will be committed) Makefile
compat.c
compat.o
debian/.debhelper/
debian/entr/
entr
entr.o
status.o
Re-running dpkg-buildpackage will include running the command dh clean, which assuming it is configured correctly in the debian/rules file will reset the source directory to the original pristine state. The same can of course also be done with regular git commands git reset --hard; git clean -fdx. To avoid accidentally committing unnecessary build artifacts in git, a debian/.gitignore can be useful and it would typically include all four files listed as untracked above.
After a successful build you would have the following files:
The contents of debian/entr are essentially what goes into the resulting entr_5.6-1_amd64.deb package. Familiarizing yourself with the majority of the files in the original upstream source as well as all the resulting build artifacts is time consuming, but it is a necessary investment to get high-quality Debian packages.
There are also tools such as Debcraft that automate generating the build artifacts in separate output directories for each build, thus making it easy to compare the changes to correlate what change in the Debian packaging led to what change in the resulting build artifacts.
Re-run the initial import with git-buildpackage
When upstreams publish releases as tarballs, they should also be imported for optimal software supply-chain security, in particular if upstream also publishes cryptographic signatures that can be used to verify the authenticity of the tarballs.
To achieve this, the files debian/watch, debian/upstream/signing-key.asc, and debian/gbp.conf need to be present with the correct options. In the gbp.conf file, ensure you have the correct options based on:
Does upstream release tarballs? If so, enforce pristine-tar = True.
Does upstream sign the tarballs? If so, configure explicit signature checking with upstream-signatures = on.
Does upstream have a git repository, and does it have release git tags? If so, configure the release git tag format, e.g. upstream-vcs-tag = %(version%~%.)s.
To validate that the above files are working correctly, run gbp import-orig with the current version explicitly defined:
shell$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"
gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'
gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz
$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz
As the original packaging was done based on the upstream release git tag, the above command will fetch the tarball release, create the pristine-tar branch, and store the tarball delta on it. This command will also attempt to create the tag upstream/5.6 on the upstream/latest branch.
Import new upstream versions in the future
Forking the upstream git repository, creating the initial packaging, and creating the DEP-14 branch structure are all one-off work needed only when creating the initial packaging.
Going forward, to import new upstream releases, one would simply run git fetch upstreamvcs; gbp import-orig --uscan, which fetches the upstream git tags, checks for new upstream tarballs, and automatically downloads, verifies, and imports the new version. See the galera-4-demo example in the Debian source packages in git explained post as a demo you can try running yourself and examine in detail.
You can also try running gbp import-orig --uscan without specifying a version. It would fetch it, as it will notice there is now Entr version 5.7 available, and import it.
Build using git-buildpackage
From this stage onwards you should build the package using gbp buildpackage, which will do a more comprehensive build.
shellgbp buildpackage -uc -us
gbp buildpackage -uc -us
The git-buildpackage build also includes running Lintian to find potential Debian policy violations in the sources or in the resulting .deb binary packages. Many Debian Developers run lintian -EviIL +pedantic after every build to check that there are no new nags, and to validate that changes intended to previous Lintian nags were correct.
Open a Merge Request on Salsa for Debian packaging review
Getting everything perfectly right takes a lot of effort, and may require reaching out to an experienced Debian Developers for review and guidance. Thus, you should aim to publish your initial packaging work on Salsa, Debian s GitLab instance, for review and feedback as early as possible.
For somebody to be able to easily see what you have done, you should rename your debian/latest branch to another name, for example next/debian/latest, and open a Merge Request that targets the debian/latest branch on your Salsa fork, which still has only the unmodified upstream files.
If you have followed the workflow in this post so far, you can simply run:
git checkout -b next/debian/latest
git push --set-upstream origin next/debian/latest
Open in a browser the URL visible in the git remote response
Write the Merge Request description in case the default text from your commit is not enough
Mark the MR as Draft using the checkbox
Publish the MR and request feedback
Once a Merge Request exists, discussion regarding what additional changes are needed can be conducted as MR comments. With an MR, you can easily iterate on the contents of next/debian/latest, rebase, force push, and request re-review as many times as you want.
While at it, make sure the Settings > CI/CD page has under CI/CD configuration file the value debian/salsa-ci.yml so that the CI can run and give you immediate automated feedback.
For an example of an initial packaging Merge Request, see https://salsa.debian.org/otto/entr-demo/-/merge_requests/1.
Open a Merge Request / Pull Request to fix upstream code
Due to the high quality requirements in Debian, it is fairly common that while doing the initial Debian packaging of an open source project, issues are found that stem from the upstream source code. While it is possible to carry extra patches in Debian, it is not good practice to deviate too much from upstream code with custom Debian patches. Instead, the Debian packager should try to get the fixes applied directly upstream.
Using git-buildpackage patch queues is the most convenient way to make modifications to the upstream source code so that they automatically convert into Debian patches (stored at debian/patches), and can also easily be submitted upstream as any regular git commit (and rebased and resubmitted many times over).
First, decide if you want to work out of the upstream development branch and later cherry-pick to the Debian packaging branch, or work out of the Debian packaging branch and cherry-pick to an upstream branch.
The example below starts from the upstream development branch and then cherry-picks the commit into the git-buildpackage patch queue:
shellgit checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expected
git commit -a -m "Commit title" -m "Commit body"
git push # submit upstream
gbp pq import --force --time-machine=10
git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadata
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
git checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expectedgit commit -a -m "Commit title" -m "Commit body"git push # submit upstreamgbp pq import --force --time-machine=10git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadatagbp buildpackage -uc -us -b
./entr # verify change works as expectedgbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
The example below starts by making the fix on a git-buildpackage patch queue branch, and then cherry-picking it onto the upstream development branch:
These can be run at any time, regardless if any debian/patches existed prior, or if existing patches applied cleanly or not, or if there were old patch queue branches around. Note that the extra -b in gbp buildpackage -uc -us -b instructs to build only binary packages, avoiding any nags from dpkg-source that there are modifications in the upstream sources while building in the patches-applied mode.
Programming-language specific dh-make alternatives
As each programming language has its specific way of building the source code, and many other conventions regarding the file layout and more, Debian has multiple custom tools to create new Debian source packages for specific programming languages.
Notably, Python does not have its own tool, but there is an dh_make --python option for Python support directly in dh_make itself. The list is not complete and many more tools exist. For some languages, there are even competing options, such as for Go there is in addition to dh-make-golang also Gophian.
When learning Debian packaging, there is no need to learn these tools upfront. Being aware that they exist is enough, and one can learn them only if and when one starts to package a project in a new programming language.
The difference between source git repository vs source packages vs binary packages
As seen in earlier example, running gbp buildpackage on the Entr packaging repository above will result in several files:
The entr_5.6-1_amd64.deb is the binary package, which can be installed on a Debian/Ubuntu system. The rest of the files constitute the source package. To do a source-only build, run gbp buildpackage -S and note the files produced:
The source package files can be used to build the binary .deb for amd64, or any architecture that the package supports. It is important to grasp that the Debian source package is the preferred form to be able to build the binary packages on various Debian build systems, and the Debian source package is not the same thing as the Debian packaging git repository contents.
If the package is large and complex, the build could result in multiple binary packages. One set of package definition files in debian/ will however only ever result in a single source package.
Option to repackage source packages with Files-Excluded lists in the debian/copyright file
Some upstream projects may include binary files in their release, or other undesirable content that needs to be omitted from the source package in Debian. The easiest way to filter them out is by adding to the debian/copyright file a Files-Excluded field listing the undesired files. The debian/copyright file is read by uscan, which will repackage the upstream sources on-the-fly when importing new upstream releases.
For a real-life example, see the debian/copyright files in the Godot package that lists:
The resulting repackaged upstream source tarball, as well as the upstream version component, will have an extra +ds to signify that it is not the true original upstream source but has been modified by Debian:
godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb
godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb
Creating one Debian source package from multiple upstream source packages also possible
In some rare cases the upstream project may be split across multiple git repositories or the upstream release may consist of multiple components each in their own separate tarball. Usually these are very large projects that get some benefits from releasing components separately. If in Debian these are deemed to go into a single source package, it is technically possible using the component system in git-buildpackage and uscan. For an example see the gbp.conf and watch files in the node-cacache package.
Using this type of structure should be a last resort, as it creates complexity and inter-dependencies that are bound to cause issues later on. It is usually better to work with upstream and champion universal best practices with clear releases and version schemes.
When not to start the Debian packaging repository as a fork of the upstream one
Not all upstreams use Git for version control. It is by far the most popular, but there are still some that use e.g. Subversion or Mercurial. Who knows maybe in the future some new version control systems will start to compete with Git. There are also projects that use Git in massive monorepos and with complex submodule setups that invalidate the basic assumptions required to map an upstream Git repository into a Debian packaging repository.
In those cases one can t use a debian/latest branch on a clone of the upstream git repository as the starting point for the Debian packaging, but one must revert the traditional way of starting from an upstream release tarball with gbp import-orig package-1.0.tar.gz.
Conclusion
Created in August 1993, Debian is one of the oldest Linux distributions. In the 32 years since inception, the .deb packaging format and the tooling to work with it have evolved several generations. In the past 10 years, more and more Debian Developers have converged on certain core practices evidenced by https://trends.debian.net/, but there is still a lot of variance in workflows even for identical tasks. Hopefully, you find this post useful in giving practical guidance on how exactly to do the most common things when packaging software for Debian.
Happy packaging!
In this post, I demonstrate the optimal workflow for creating new Debian packages in 2025, preserving the upstream git history. The motivation for this is to lower the barrier for sharing improvements to and from upstream, and to improve software provenance and supply-chain security by making it easy to inspect every change at any level using standard git tooling.
Key elements of this workflow include:
Using a Git fork/clone of the upstream repository as the starting point for creating Debian packaging repositories.
Consistent use of the same git-buildpackage commands, with all package-specific options in gbp.conf.
Pristine-tar and upstream signatures for supply-chain security.
Use of Files-Excluded in the debian/copyright file to filter out unwanted files in Debian.
Patch queues to easily rebase and cherry-pick changes across Debian and upstream branches.
Efficient use of Salsa, Debian s GitLab instance, for both automated feedback from CI systems and human feedback from peer reviews.
To make the instructions so concrete that anyone can repeat all the steps themselves on a real package, I demonstrate the steps by packaging the command-line tool Entr. It is written in C, has very few dependencies, and its final Debian source package structure is simple, yet exemplifies all the important parts that go into a complete Debian package:
Creating a new packaging repository and publishing it under your personal namespace on salsa.debian.org.
Using dh_make to create the initial Debian packaging.
Posting the first draft of the Debian packaging as a Merge Request (MR) and using Salsa CI to verify Debian packaging quality.
Running local builds efficiently and iterating on the packaging process.
Create new Debian packaging repository from the existing upstream project git repository
First, create a new empty directory, then clone the upstream Git repository inside it:
Using a clean directory makes it easier to inspect the build artifacts of a Debian package, which will be output in the parent directory of the Debian source directory.
The extra parameters given to git clone lay the foundation for the Debian packaging git repository structure where the upstream git remote name is upstreamvcs. Only the upstream main branch is tracked to avoid cluttering git history with upstream development branches that are irrelevant for packaging in Debian.
Next, enter the git repository directory and list the git tags. Pick the latest upstream release tag as the commit to start the branch upstream/latest. This latest refers to the upstream release, not the upstream development branch. Immediately after, branch off the debian/latest branch, which will have the actual Debian packaging files in the debian/ subdirectory.
shellcd entr
git tag # shows the latest upstream release tag was '5.6'
git checkout -b upstream/latest 5.6
git checkout -b debian/latest
cd entr
git tag # shows the latest upstream release tag was '5.6'git checkout -b upstream/latest 5.6
git checkout -b debian/latest
At this point, the repository is structured according to DEP-14 conventions, ensuring a clear separation between upstream and Debian packaging changes, but there are no Debian changes yet. Next, add the Salsa repository as a new remote which called origin, the same as the default remote name in git.
This is an important preparation step to later be able to create a Merge Request on Salsa that targets the debian/latest branch, which does not yet have any debian/ directory.
Launch a Debian Sid (unstable) container to run builds in
To ensure that all packaging tools are of the latest versions, run everything inside a fresh Sid container. This has two benefits: you are guaranteed to have the most up-to-date toolchain, and your host system stays clean without getting polluted by various extra packages. Additionally, this approach works even if your host system is not Debian/Ubuntu.
cd ..
podman run --interactive --tty --rm --shm-size=1G --cap-add SYS_PTRACE \
--env='DEB*' --volume=$PWD:/tmp/test --workdir=/tmp/test debian:sid bash
Note that the container should be started from the parent directory of the git repository, not inside it. The --volume parameter will loop-mount the current directory inside the container. Thus all files created and modified are on the host system, and will persist after the container shuts down.
Once inside the container, install the basic dependencies:
Automate creating the debian/ files with dh-make
To create the files needed for the actual Debian packaging, use dh_make:
shell# dh_make --packagename entr_5.6 --single --createorig
Maintainer Name : Otto Kek l inen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]
Done. Please edit the files in the debian/ subdirectory now.
# dh_make --packagename entr_5.6 --single --createorigMaintainer Name : Otto Kek l inen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]Done. Please edit the files in the debian/ subdirectory now.
Due to how dh_make works, the package name and version need to be written as a single underscore separated string. In this case, you should choose --single to specify that the package type is a single binary package. Other options would be --library for library packages (see libgda5 sources as an example) or --indep (see dns-root-data sources as an example). The --createorig will create a mock upstream release tarball (entr_5.6.orig.tar.xz) from the current release directory, which is necessary due to historical reasons and how dh_make worked before git repositories became common and Debian source packages were based off upstream release tarballs (e.g. *.tar.gz).
At this stage, a debian/ directory has been created with template files, and you can start modifying the files and iterating towards actual working packaging.
shellgit add debian/
git commit -a -m "Initial Debian packaging"
git add debian/
git commit -a -m "Initial Debian packaging"
Review the files
The full list of files after the above steps with dh_make would be:
You can browse these files in the demo repository.
The mandatory files in the debian/ directory are:
changelog,
control,
copyright,
and rules.
All the other files have been created for convenience so the packager has template files to work from. The files with the suffix .ex are example files that won t have any effect until their content is adjusted and the suffix removed.
For detailed explanations of the purpose of each file in the debian/ subdirectory, see the following resources:
The Debian Policy Manual: Describes the structure of the operating system, the package archive and requirements for packages to be included in the Debian archive.
The Developer s Reference: A collection of best practices and process descriptions Debian packagers are expected to follow while interacting with one another.
Debhelper man pages: Detailed information of how the Debian package build system works, and how the contents of the various files in debian/ affect the end result.
As Entr, the package used in this example, is a real package that already exists in the Debian archive, you may want to browse the actual Debian packaging source at https://salsa.debian.org/debian/entr/-/tree/debian/latest/debian for reference.
Most of these files have standardized formatting conventions to make collaboration easier. To automatically format the files following the most popular conventions, simply run wrap-and-sort -vast or debputy reformat --style=black.
Identify build dependencies
The most common reason for builds to fail is missing dependencies. The easiest way to identify which Debian package ships the required dependency is using apt-file. If, for example, a build fails complaining that pcre2posix.h cannot be found or that libcre2-posix.so is missing, you can use these commands:
The output above implies that the debian/control should be extended to define a Build-Depends: libpcre2-dev relationship.
There is also dpkg-depcheck that uses strace to trace the files the build process tries to access, and lists what Debian packages those files belong to. Example usage:
shelldpkg-depcheck -b debian/rules build
dpkg-depcheck -b debian/rules build
Build the Debian sources to generate the .deb package
After the first pass of refining the contents of the files in debian/, test the build by running dpkg-buildpackage inside the container:
shelldpkg-buildpackage -uc -us -b
dpkg-buildpackage -uc -us -b
The options -uc -us will skip signing the resulting Debian source package and other build artifacts. The -b option will skip creating a source package and only build the (binary) *.deb packages.
The output is very verbose and gives a large amount of context about what is happening during the build to make debugging build failures easier. In the build log of entr you will see for example the line dh binary --buildsystem=makefile. This and other dh commands can also be run manually if there is a need to quickly repeat only a part of the build while debugging build failures.
To see what files were generated or modified by the build simply run git status --ignored:
shell$ git status --ignored
On branch debian/latest
Untracked files:
(use "git add <file>..." to include in what will be committed)
debian/debhelper-build-stamp
debian/entr.debhelper.log
debian/entr.substvars
debian/files
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
Makefile
compat.c
compat.o
debian/.debhelper/
debian/entr/
entr
entr.o
status.o
$ git status --ignored
On branch debian/latest
Untracked files:
(use "git add <file>..." to include in what will be committed) debian/debhelper-build-stamp
debian/entr.debhelper.log
debian/entr.substvars
debian/files
Ignored files:
(use "git add -f <file>..." to include in what will be committed) Makefile
compat.c
compat.o
debian/.debhelper/
debian/entr/
entr
entr.o
status.o
Re-running dpkg-buildpackage will include running the command dh clean, which assuming it is configured correctly in the debian/rules file will reset the source directory to the original pristine state. The same can of course also be done with regular git commands git reset --hard; git clean -fdx. To avoid accidentally committing unnecessary build artifacts in git, a debian/.gitignore can be useful and it would typically include all four files listed as untracked above.
After a successful build you would have the following files:
The contents of debian/entr are essentially what goes into the resulting entr_5.6-1_amd64.deb package. Familiarizing yourself with the majority of the files in the original upstream source as well as all the resulting build artifacts is time consuming, but it is a necessary investment to get high-quality Debian packages.
There are also tools such as Debcraft that automate generating the build artifacts in separate output directories for each build, thus making it easy to compare the changes to correlate what change in the Debian packaging led to what change in the resulting build artifacts.
Re-run the initial import with git-buildpackage
When upstreams publish releases as tarballs, they should also be imported for optimal software supply-chain security, in particular if upstream also publishes cryptographic signatures that can be used to verify the authenticity of the tarballs.
To achieve this, the files debian/watch, debian/upstream/signing-key.asc, and debian/gbp.conf need to be present with the correct options. In the gbp.conf file, ensure you have the correct options based on:
Does upstream release tarballs? If so, enforce pristine-tar = True.
Does upstream sign the tarballs? If so, configure explicit signature checking with upstream-signatures = on.
Does upstream have a git repository, and does it have release git tags? If so, configure the release git tag format, e.g. upstream-vcs-tag = %(version%~%.)s.
To validate that the above files are working correctly, run gbp import-orig with the current version explicitly defined:
shell$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"
gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'
gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz
$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz
As the original packaging was done based on the upstream release git tag, the above command will fetch the tarball release, create the pristine-tar branch, and store the tarball delta on it. This command will also attempt to create the tag upstream/5.6 on the upstream/latest branch.
Import new upstream versions in the future
Forking the upstream git repository, creating the initial packaging, and creating the DEP-14 branch structure are all one-off work needed only when creating the initial packaging.
Going forward, to import new upstream releases, one would simply run git fetch upstreamvcs; gbp import-orig --uscan, which fetches the upstream git tags, checks for new upstream tarballs, and automatically downloads, verifies, and imports the new version. See the galera-4-demo example in the Debian source packages in git explained post as a demo you can try running yourself and examine in detail.
You can also try running gbp import-orig --uscan without specifying a version. It would fetch it, as it will notice there is now Entr version 5.7 available, and import it.
Build using git-buildpackage
From this stage onwards you should build the package using gbp buildpackage, which will do a more comprehensive build.
shellgbp buildpackage -uc -us
gbp buildpackage -uc -us
The git-buildpackage build also includes running Lintian to find potential Debian policy violations in the sources or in the resulting .deb binary packages. Many Debian Developers run lintian -EviIL +pedantic after every build to check that there are no new nags, and to validate that changes intended to previous Lintian nags were correct.
Open a Merge Request on Salsa for Debian packaging review
Getting everything perfectly right takes a lot of effort, and may require reaching out to an experienced Debian Developers for review and guidance. Thus, you should aim to publish your initial packaging work on Salsa, Debian s GitLab instance, for review and feedback as early as possible.
For somebody to be able to easily see what you have done, you should rename your debian/latest branch to another name, for example next/debian/latest, and open a Merge Request that targets the debian/latest branch on your Salsa fork, which still has only the unmodified upstream files.
If you have followed the workflow in this post so far, you can simply run:
git checkout -b next/debian/latest
git push --set-upstream origin next/debian/latest
Open in a browser the URL visible in the git remote response
Write the Merge Request description in case the default text from your commit is not enough
Mark the MR as Draft using the checkbox
Publish the MR and request feedback
Once a Merge Request exists, discussion regarding what additional changes are needed can be conducted as MR comments. With an MR, you can easily iterate on the contents of next/debian/latest, rebase, force push, and request re-review as many times as you want.
While at it, make sure the Settings > CI/CD page has under CI/CD configuration file the value debian/salsa-ci.yml so that the CI can run and give you immediate automated feedback.
For an example of an initial packaging Merge Request, see https://salsa.debian.org/otto/entr-demo/-/merge_requests/1.
Open a Merge Request / Pull Request to fix upstream code
Due to the high quality requirements in Debian, it is fairly common that while doing the initial Debian packaging of an open source project, issues are found that stem from the upstream source code. While it is possible to carry extra patches in Debian, it is not good practice to deviate too much from upstream code with custom Debian patches. Instead, the Debian packager should try to get the fixes applied directly upstream.
Using git-buildpackage patch queues is the most convenient way to make modifications to the upstream source code so that they automatically convert into Debian patches (stored at debian/patches), and can also easily be submitted upstream as any regular git commit (and rebased and resubmitted many times over).
First, decide if you want to work out of the upstream development branch and later cherry-pick to the Debian packaging branch, or work out of the Debian packaging branch and cherry-pick to an upstream branch.
The example below starts from the upstream development branch and then cherry-picks the commit into the git-buildpackage patch queue:
shellgit checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expected
git commit -a -m "Commit title" -m "Commit body"
git push # submit upstream
gbp pq import --force --time-machine=10
git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadata
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
git checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expectedgit commit -a -m "Commit title" -m "Commit body"git push # submit upstreamgbp pq import --force --time-machine=10git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadatagbp buildpackage -uc -us -b
./entr # verify change works as expectedgbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
The example below starts by making the fix on a git-buildpackage patch queue branch, and then cherry-picking it onto the upstream development branch:
These can be run at any time, regardless if any debian/patches existed prior, or if existing patches applied cleanly or not, or if there were old patch queue branches around. Note that the extra -b in gbp buildpackage -uc -us -b instructs to build only binary packages, avoiding any nags from dpkg-source that there are modifications in the upstream sources while building in the patches-applied mode.
Programming-language specific dh-make alternatives
As each programming language has its specific way of building the source code, and many other conventions regarding the file layout and more, Debian has multiple custom tools to create new Debian source packages for specific programming languages.
Notably, Python does not have its own tool, but there is an dh_make --python option for Python support directly in dh_make itself. The list is not complete and many more tools exist. For some languages, there are even competing options, such as for Go there is in addition to dh-make-golang also Gophian.
When learning Debian packaging, there is no need to learn these tools upfront. Being aware that they exist is enough, and one can learn them only if and when one starts to package a project in a new programming language.
The difference between source git repository vs source packages vs binary packages
As seen in earlier example, running gbp buildpackage on the Entr packaging repository above will result in several files:
The entr_5.6-1_amd64.deb is the binary package, which can be installed on a Debian/Ubuntu system. The rest of the files constitute the source package. To do a source-only build, run gbp buildpackage -S and note the files produced:
The source package files can be used to build the binary .deb for amd64, or any architecture that the package supports. It is important to grasp that the Debian source package is the preferred form to be able to build the binary packages on various Debian build systems, and the Debian source package is not the same thing as the Debian packaging git repository contents.
If the package is large and complex, the build could result in multiple binary packages. One set of package definition files in debian/ will however only ever result in a single source package.
Option to repackage source packages with Files-Excluded lists in the debian/copyright file
Some upstream projects may include binary files in their release, or other undesirable content that needs to be omitted from the source package in Debian. The easiest way to filter them out is by adding to the debian/copyright file a Files-Excluded field listing the undesired files. The debian/copyright file is read by uscan, which will repackage the upstream sources on-the-fly when importing new upstream releases.
For a real-life example, see the debian/copyright files in the Godot package that lists:
The resulting repackaged upstream source tarball, as well as the upstream version component, will have an extra +ds to signify that it is not the true original upstream source but has been modified by Debian:
godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb
godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb
Creating one Debian source package from multiple upstream source packages also possible
In some rare cases the upstream project may be split across multiple git repositories or the upstream release may consist of multiple components each in their own separate tarball. Usually these are very large projects that get some benefits from releasing components separately. If in Debian these are deemed to go into a single source package, it is technically possible using the component system in git-buildpackage and uscan. For an example see the gbp.conf and watch files in the node-cacache package.
Using this type of structure should be a last resort, as it creates complexity and inter-dependencies that are bound to cause issues later on. It is usually better to work with upstream and champion universal best practices with clear releases and version schemes.
When not to start the Debian packaging repository as a fork of the upstream one
Not all upstreams use Git for version control. It is by far the most popular, but there are still some that use e.g. Subversion or Mercurial. Who knows maybe in the future some new version control systems will start to compete with Git. There are also projects that use Git in massive monorepos and with complex submodule setups that invalidate the basic assumptions required to map an upstream Git repository into a Debian packaging repository.
In those cases one can t use a debian/latest branch on a clone of the upstream git repository as the starting point for the Debian packaging, but one must revert the traditional way of starting from an upstream release tarball with gbp import-orig package-1.0.tar.gz.
Conclusion
Created in August 1993, Debian is one of the oldest Linux distributions. In the 32 years since inception, the .deb packaging format and the tooling to work with it have evolved several generations. In the past 10 years, more and more Debian Developers have converged on certain core practices evidenced by https://trends.debian.net/, but there is still a lot of variance in workflows even for identical tasks. Hopefully, you find this post useful in giving practical guidance on how exactly to do the most common things when packaging software for Debian.
Happy packaging!
Debian hard freeze on 2025-05-15? We bring you a new Grml release on top of that! 2025.05 codename Nudlaug.
There s plenty of new stuff, check out our official release announcement for all the details. But I d like to highlight one feature that I particularly like: SSH service announcement with Avahi. The grml-full flavor ships Avahi, and when you enable SSH, it automatically announces the SSH service on your local network. So when f.e. booting Grml with boot optionssh=debian , you should be able to login on your Grml live system with ssh grml@grml.local and password debian :
% insecssh grml@grml.local
Warning: Permanently added 'grml.local' (ED25519) to the list of known hosts.
grml@grml.local's password:
Linux grml 6.12.27-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.27-1 (2025-05-06) x86_64
Grml - Linux for geeks
grml@grml ~ %
Hint: grml-zshrc provides that useful shell alias insecssh , which is aliased to ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null . Using those options, you aren t storing the SSH host key of the (temporary) Grml live system (permanently) in your UserKnownHostsFile.
BTW, you can run avahi-browse -d local _ssh._tcp resolve -t to discover the SSH services on your local network.
Happy Grml-ing!
I ve been using Procmail to filter mail for a long time. Reading Antoine s blog
post procmail considered
harmful, I felt
motivated (and shamed) into migrating to something else. Luckily, Enrico's
shared a detailed roadmap for moving to
Sieve,
in particular Dovecot's Sieve implementation (which provides "pipe" and
"filter" extensions).
My MTA is Exim, and for my first foray into this, I didn't want to change that1.
Exim provides two filtering languages for users: an implementation of Sieve, and
its own filter language.
Requirements
A good first step is to look at what I'm using Procmail for:
I invoke external mail filters: processes which read the mail and
emit a possibly altered mail (headers added, etc.). In particular,
crm114 (which has worked remarkably well for me)
to classify mail as spam or not, and dsafilter, to mark up Debian
Security Advisories
I file messages into different folders depending on the outcome of the
above filters
I drop mail ("killfile") some sender addresses (persistent pests on mailing lists);
and mails containing certain hosts in the References header (as an imperfect
way of dropping mailing list threads which are replies to someone I've killfiled);
and mail encoded in a character set for a language I can't read (Russian, Korean,
etc.), and several other simple static rules
I move mailing list mail into folders, semi-automatically (see list filtering)
I strip "tagged" subjects for some mailing lists: i.e., incoming mail has
subjects like "[cs-historic-committee] help moving several tons of IBM360",
and I don't want the "[cs-historic-committee]" bit.
I file a copy of some messages, the name of which is partly
derived from the current calendar year
Exim Filters
I want to continue to do (1), which rules out Exim's implementation of Sieve,
which does not support invoking external programs. Exim's own filter language
has a pipe function that might do what I need, so let's look at how to
achieve the above with Exim Filters.
autolists
Here's an autolist recipe for Debian's mailing lists, in Exim filter
language. Contrast with the Procmail in list filtering:
if $header_list-id matches "(debian.*)\.lists\.debian\.org"
then
save Maildir/l/$1/
finish
endif
Hands down, the exim filter is nicer (although some of the rules on escape characters
in exim filters, not demonstrated here, are byzantine).
killfile
An ideal chunk of configuration for kill-filing a list of addresses is
light on boiler plate, and easy to add more addresses to in the future.
This is the best I could come up with:
if foranyaddress "someone@example.org,\
another@example.net,\
especially-bad.example.com,\
"
($reply_address contains $thisaddress
or $header_references contains $thisaddress)
then finish endif
I won't bother sharing the equivalent Procmail but it's pretty
comparable: the exim filter is no great improvement.
It would be lovely if the list of addresses could be stored elsewhere, such
as a simple text file, one line per address, or even a database.
Exim's own configuration language (distinct from this filter language) has
some nice mechanisms for reading lists of things like addresses from files
or databases. Sadly it seems the filter language lacks anything similar.
external filters
With Procmail, I pass the mail to an external program, and then read the
output of that program back, as the new content of the mail, which
continues to be filtered: subsequent filter rules inspect the headers
to see what the outcome of the filter was (is it spam?) and to decide
what to do accordingly. Crucially, we also check the return status
of the filter, to handle the case when it fails.
With Exim filters, we can use pipe to invoke an external program:
pipe "$home/mail/mailreaver.crm -u $home/mail/"
However, this is not a filter: the mail is sent to the external program,
and the exim filter's job is complete. We can't write further filter
rules to continue to process the mail: the external program would have
to do that; and we have no way of handling errors.
Here's Exim's documentation on what happens when the external command
fails:
Most non-zero codes are treated by Exim as indicating a failure of the pipe.
This is treated as a delivery failure, causing the message to be returned to
its sender.
That is definitely not what I want: if the filter broke (even temporarily), Exim
would seemingly generate a bounce to the sender address, which could be anything,
and I wouldn't have a copy of the message.
The documentation goes on to say that some shell return codes (defaulting to 73
and 75) cause Exim to treat it as a temporary error, spool the mail and retry
later on. That's a much better behaviour for my use-case. Having said that, on
the rare occasions I've broken the filter, the thing which made me notice most
quickly was spam hitting my inbox, which my Procmail recipe achieves.
removing subject tagging
Here, Exim's filter language gets unstuck. There is no way to add or alter
headers for a message in a user filter. Exim uses the same filter language
for system-wide message filtering,
and in that context, it has some extra functions: headers add <string>,
headers remove <string>, but (for reasons I don't know) these are not
available for user filters.
copy mail to archive folder
I can't see a way to derive a folder name from the calendar year.
next steps
Exim Sieve implementation and its filter language are ruled out as Procmail
replacements because they can't do at least two of the things I need to do.
However, based on Enrico's write-up, it looks like Dovecot's Sieve
implementation probably can. I was also recommended
maildrop, which I might look at if
Dovecot Sieve doesn't pan out.
I should revisit this requirement because I could probably
reconfigure exim to run my spam classifier at the system level, obviating the
need to do it in a user filter, and also raising the opportunity to do
smtp-time rejection based on the outcome
I found a very nice script to create Notes on the iPhone from the command line by hossman over at Perlmonks.
For some weird reason Perlmonks does not allow me to reply with amendments even after I created an account. I can "preview" a reply at Perlmonks but after "create" I get "Permission Denied". Duh. vroom, if you want screenshots, contact me on IRC .
As I wrote everything up for the Perlmonks reply anyways, I'll post it here instead.
Against hossman's version 32 from 2011-02-22 I changed the following:
removed .pl from filename and documentation
added --list to list existing notes
added --hosteurope for Hosteurope mail account preferences and with it a sample how to add username and password into the script for unattended use
made the "Notes" folder the default (so -f Notes becomes obsolete)
added some UTF-8 conversions to make Umlauts work better (this is a mess in perl, see Jeremy Zawodny's writeup and Ivan Kurmanov's blog entry for some further solutions). Please try combinations of utf8::encode and ::decode, binmode utf8 for STDIN and/or STDOUT and the other hints from these linked blog entries in your local setup to get Umlauts and other non-7bit ASCII characters working. Be patient. There's more than one way to do it .
Another short status update of what happened on my side last month.
Notable might be the Cell Broadcast support for Qualcomm SoCs, the
rest is smaller fixes and QoL improvements.
phosh
Nextcloud is an open-source software suite that
enables you to set up and manage your own cloud storage and collaboration
platform. It offers a range of features similar to popular cloud services
like Google Drive or Dropbox but with the added benefit of complete control
over your data and the server where it s hosted.
I wanted to have a look at Nextcloud and the steps to setup a own instance
with a PostgreSQL based database together with NGinx as the webserver to
serve the WebUI. Before doing a full productive setup I wanted to play
around locally with all the needed steps and worked out all the steps within
KVM machine.
While doing this I wrote down some notes to mostly document for myself what
I need to do to get a Nextcloud installation running and usable. So this
manual describes how to setup a Nextcloud installation on Debian 12 Bookworm
based on NGinx and PostgreSQL.
Nextcloud Installation
Install PHP and PHP extensions for Nextcloud
Nextcloud is basically a PHP application so we need to install PHP packages
to get it working in the end. The following steps are based on the upstream
documentation about how to install a own
Nextcloud instance.
Installing the virtual package package php on a Debian Bookworm system
would pull in the depending meta package php8.2. This package itself would
then pull also the package libapache2-mod-php8.2 as an dependency which
then would pull in also the apache2 webserver as a depending package. This
is something I don t wanted to have as I want to use NGinx that is already
installed on the system instead.
To get this we need to explicitly exclude the package libapache2-mod-php8.2
from the list of packages which we want to install, to achieve this we have
to append a hyphen - at the end of the package name, so we need to use
libapache2-mod-php8.2- within the package list that is telling apt to
ignore this package as an dependency. I ended up with this call to get all
needed dependencies installed.
To make these settings effective, restart the php-fpm service
$ sudo systemctl restart php8.2-fpm
Install PostgreSQL, Create a database and user
This manual assumes we will use a PostgreSQL server on localhost, if you
have a server instance on some remote site you can skip the installation
step here.
$ sudo apt install postgresql postgresql-contrib postgresql-client
Check version after installation (optinal step):
$ sudo -i -u postgres$ psql -version
This output will be seen:
psql (15.12 (Debian 15.12-0+deb12u2))
Exit the PSQL shell by using the command \q.
postgres=# \q
Exit the CLI of the postgres user:
postgres@host:~$ exit
Create a PostgreSQL Database and User:
Create a new PostgreSQL user (Use a strong password!):
$ sudo -u postgres psql -c "CREATE USER nextcloud_user PASSWORD '1234';"
Create new database and grant access:
$ sudo -u postgres psql -c "CREATE DATABASE nextcloud_db WITH OWNER nextcloud_user ENCODING=UTF8;"
(Optional) Check if we now can connect to the database server and the
database in detail (you will get a question about the password for the database
user!). If this is not working it makes no sense to proceed further! We need to
fix first the access then!
$ psql -h localhost -U nextcloud_user -d nextcloud_db
or
$ psql -h 127.0.0.1 -U nextcloud_user -d nextcloud_db
Log out from postgres shell using the command \q.
Download and install Nextcloud
Use the following command to download the latest version of Nextcloud:
$ wget https://download.nextcloud.com/server/releases/latest.zip
Extract file into the folder /var/www/html with the following command:
$ sudo unzip latest.zip -d /var/www/html
Change ownership of the /var/www/html/nextcloud directory to www-data.
$ sudo chown -R www-data:www-data /var/www/html/nextcloud
Configure NGinx for Nextcloud to use a certificate
In case you want to use self signed certificate, e.g. if you play around to
setup Nextcloud locally for testing purposes you can do the following steps.
If you want or need to use the service of Let s Encrypt (or similar) drop
the step above and create your required key data by using this command:
$ sudo certbot --nginx -d nextcloud.your-domain.com
You will need to adjust the path to the key and certificate in the next
step!
Change the NGinx configuration:
$ sudo vi /etc/nginx/sites-available/nextcloud.conf
Add the following snippet into the file and save it.
# /etc/nginx/sites-available/nextcloud.conf
upstreamphp-handler#server 127.0.0.1:9000;
serverunix:/run/php/php8.2-fpm.sock;
# Set the immutable cache control options only for assets with a cache
# busting v argument
map $arg_v $asset_immutable
"""";
default",immutable";
serverlisten80;
listen[::]:80;
# Adjust this to the correct server name!
server_namenextcloud.local;
# Prevent NGinx HTTP Server Detection
server_tokensoff;
# Enforce HTTPS
return301https://$server_name$request_uri;
serverlisten443sslhttp2;
listen[::]:443sslhttp2;
# Adjust this to the correct server name!
server_namenextcloud.local;
# Path to the root of your installation
root/var/www/html/nextcloud;
# Use Mozilla's guidelines for SSL/TLS settings
# https://mozilla.github.io/server-side-tls/ssl-config-generator/
# Adjust the usage and paths of the correct key data! E.g. it you want to use Let's Encrypt key material!
ssl_certificate/etc/ssl/certs/nextcloud.crt;
ssl_certificate_key/etc/ssl/private/nextcloud.key;
# ssl_certificate /etc/letsencrypt/live/nextcloud.your-domain.com/fullchain.pem;
# ssl_certificate_key /etc/letsencrypt/live/nextcloud.your-domain.com/privkey.pem;
# Prevent NGinx HTTP Server Detection
server_tokensoff;
# HSTS settings
# WARNING: Only add the preload option once you read about
# the consequences in https://hstspreload.org/. This option
# will add the domain to a hardcoded list that is shipped
# in all major browsers and getting removed from this list
# could take several months.
#add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload" always;
# set max upload size and increase upload timeout:
client_max_body_size512M;
client_body_timeout300s;
fastcgi_buffers644K;
# Enable gzip but do not remove ETag headers
gzipon;
gzip_varyon;
gzip_comp_level4;
gzip_min_length256;
gzip_proxiedexpiredno-cacheno-storeprivateno_last_modifiedno_etagauth;
gzip_typesapplication/atom+xmltext/javascriptapplication/javascriptapplication/jsonapplication/ld+jsonapplication/manifest+jsonapplication/rss+xmlapplication/vnd.geo+jsonapplication/vnd.ms-fontobjectapplication/wasmapplication/x-font-ttfapplication/x-web-app-manifest+jsonapplication/xhtml+xmlapplication/xmlfont/opentypeimage/bmpimage/svg+xmlimage/x-icontext/cache-manifesttext/csstext/plaintext/vcardtext/vnd.rim.location.xloctext/vtttext/x-componenttext/x-cross-domain-policy;
# Pagespeed is not supported by Nextcloud, so if your server is built
# with the ngx_pagespeed module, uncomment this line to disable it.
#pagespeed off;
# The settings allows you to optimize the HTTP2 bandwidth.
# See https://blog.cloudflare.com/delivering-http-2-upload-speed-improvements/
# for tuning hints
client_body_buffer_size512k;
# HTTP response headers borrowed from Nextcloud .htaccess
add_headerReferrer-Policy"no-referrer"always;
add_headerX-Content-Type-Options"nosniff"always;
add_headerX-Frame-Options"SAMEORIGIN"always;
add_headerX-Permitted-Cross-Domain-Policies"none"always;
add_headerX-Robots-Tag"noindex,nofollow"always;
add_headerX-XSS-Protection"1; mode=block"always;
# Remove X-Powered-By, which is an information leak
fastcgi_hide_headerX-Powered-By;
# Set .mjs and .wasm MIME types
# Either include it in the default mime.types list
# and include that list explicitly or add the file extension
# only for Nextcloud like below:
includemime.types;
typestext/javascriptjsmjs;
application/wasmwasm;
# Specify how to handle directories -- specifying /index.php$request_uri
# here as the fallback means that NGinx always exhibits the desired behaviour
# when a client requests a path that corresponds to a directory that exists
# on the server. In particular, if that directory contains an index.php file,
# that file is correctly served; if it doesn't, then the request is passed to
# the front-end controller. This consistent behaviour means that we don't need
# to specify custom rules for certain paths (e.g. images and other assets,
# /updater , /ocs-provider ), and thus
# try_files $uri $uri/ /index.php$request_uri
# always provides the desired behaviour.
indexindex.phpindex.html/index.php$request_uri;
# Rule borrowed from .htaccess to handle Microsoft DAV clients
location = /if( $http_user_agent ~ ^DavClnt)return302/remote.php/webdav/$is_args$args;
location = /robots.txtallowall;
log_not_foundoff;
access_logoff;
# Make a regex exception for /.well-known so that clients can still
# access it despite the existence of the regex rule
# location ~ /(\. autotest ...) which would otherwise handle requests
# for /.well-known .
location^~/.well-known# The rules in this block are an adaptation of the rules
# in .htaccess that concern /.well-known .
location = /.well-known/carddavreturn301/remote.php/dav/;
location = /.well-known/caldavreturn301/remote.php/dav/;
location/.well-known/acme-challengetry_files $uri $uri/ =404;
location/.well-known/pki-validationtry_files $uri $uri/ =404;
# Let Nextcloud's API for /.well-known URIs handle all other
# requests by passing them to the front-end controller.
return301/index.php$request_uri;
# Rules borrowed from .htaccess to hide certain paths from clients
location ~ ^/(?:build tests config lib 3rdparty templates data)(?:$ /)return404;
location ~ ^/(?:\. autotest occ issue indie db_ console)return404;
# Ensure this block, which passes PHP files to the PHP process, is above the blocks
# which handle static assets (as seen below). If this block is not declared first,
# then NGinx will encounter an infinite rewriting loop when it prepend /index.php
# to the URI, resulting in a HTTP 500 error response.
location ~ \.php(?:$ /)# Required for legacy support
rewrite^/(?!index remote public cron core\/ajax\/update status ocs\/v[12] updater\/.+ ocs-provider\/.+ .+\/richdocumentscode(_arm64)?\/proxy)/index.php$request_uri;
fastcgi_split_path_info^(.+?\.php)(/.*)$;
set $path_info $fastcgi_path_info;
try_files $fastcgi_script_name =404;
includefastcgi_params;
fastcgi_paramSCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_paramPATH_INFO $path_info;
fastcgi_paramHTTPSon;
fastcgi_parammodHeadersAvailabletrue; # Avoid sending the security headers twice
fastcgi_paramfront_controller_activetrue; # Enable pretty urls
fastcgi_passphp-handler;
fastcgi_intercept_errorson;
fastcgi_request_bufferingoff;
fastcgi_max_temp_file_size0;
# Serve static files
location ~ \.(?:css js mjs svg gif png jpg ico wasm tflite map ogg flac)$try_files $uri /index.php$request_uri;
# HTTP response headers borrowed from Nextcloud .htaccess
add_headerCache-Control"public,max-age=15778463$asset_immutable";
add_headerReferrer-Policy"no-referrer"always;
add_headerX-Content-Type-Options"nosniff"always;
add_headerX-Frame-Options"SAMEORIGIN"always;
add_headerX-Permitted-Cross-Domain-Policies"none"always;
add_headerX-Robots-Tag"noindex,nofollow"always;
add_headerX-XSS-Protection"1; mode=block"always;
access_logoff; # Optional: Don't log access to assets
location ~ \.woff2?$try_files $uri /index.php$request_uri;
expires7d; # Cache-Control policy borrowed from .htaccess
access_logoff; # Optional: Don't log access to assets
# Rule borrowed from .htaccess
location/remotereturn301/remote.php$request_uri;
location/try_files $uri $uri/ /index.php$request_uri;
Symlink configuration site available to site enabled.
$ ln -s /etc/nginx/sites-available/nextcloud.conf /etc/nginx/sites-enabled/
Restart NGinx and access the URI in the browser.
Go through the installation of Nextcloud.
The user data on the installation dialog should point e.g to
administrator or similar, that user will become administrative access
rights in Nextcloud!
To adjust the database connection detail you have to edit the file
$install_folder/config/config.php.
Means here in the example within this post you would need to modify
/var/www/html/nextcloud/config/config.php to control or change the
database connection.
---%<---'dbname'=>'nextcloud_db',
'dbhost'=>'localhost', #(Or your remote PostgreSQL server address if you have.)
'dbport'=>'',
'dbtableprefix'=>'oc_',
'dbuser'=>'nextcloud_user',
'dbpassword'=>'1234', #(The password you set for database user.)
--->%---
After the installation and setup of the Nextcloud PHP application there are
more steps to be done. Have a look into the WebUI what you will need to do
as additional steps like create a cronjob or tuning of some more PHP
configurations.
If you ve done all things correct you should see a login page similar to
this:
Optional other steps for more enhanced configuration modifications
Move the data folder to somewhere else
The data folder is the root folder for all user content. By default it is
located in $install_folder/data, so in our case here it is in
/var/www/html/nextcloud/data.
Move the data directory outside the web server document root.
$ sudo mv /var/www/html/nextcloud/data /var/nextcloud_data
Ensure access permissions, mostly not needed if you move the folder.
$ sudo chown -R www-data:www-data /var/nextcloud_data$ sudo chown -R www-data:www-data /var/www/html/nextcloud/
Update the Nextcloud configuration:
Open the config/config.php file of your Nextcloud installation.
$ sudo vi /var/www/html/nextcloud/config/config.php
Update the datadirectory parameter to point to the new location of your data directory.
Make the installation available for multiple FQDNs on the same server
Adjust the Nextcloud configuration to listen and accept requests for
different domain names. Configure and adjust the key trusted_domains
accordingly.
$ sudo vi /var/www/html/nextcloud/config/config.php
Create and adjust the needed site configurations for the webserver.
Restart the NGinx unit.
An error message about .ocdata might occur
.ocdata is not found inside the data directory
Create file using touch and set necessary permissions.
$ sudo touch /var/nextcloud_data/.ocdata$ sudo chown -R www-data:www-data /var/nextcloud_data/
The password for the administrator user is unknown
Log in to your server:
SSH into the server where your PostgreSQL database is hosted.
Switch to the PostgreSQL user:
$ sudo -i -u postgres
Access the PostgreSQL command line
psql
List the databases: (If you re unsure which database is being used by Nextcloud, you can list all the databases by the list command.)
\l
Switch to the Nextcloud database:
Switch to the specific database that Nextcloud is using.
\c nextclouddb
Reset the password for the Nextcloud database user:
ALTER USER nextcloud_user WITH PASSWORD 'new_password';
Exit the PostgreSQL command line:
\q
Verify Database Configuration:
Check the database connection details in the config.php file to ensure they are correct.
sudo vi /var/www/html/nextcloud/config/config.php
Replace nextcloud_db, nextcloud_user, and your_password with your actual database name, user, and password.
---%<---'dbname'=>'nextcloud_db',
'dbhost'=>'localhost', #(or your PostgreSQL server address)
'dbport'=>'',
'dbtableprefix'=>'oc_',
'dbuser'=>'nextcloud_user',
'dbpassword'=>'1234', #(The password you set for nextcloud_user.)
--->%---
Restart NGinx and access the UI through the browser.
Welcome to the third report in 2025 from the Reproducible Builds project. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. As usual, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.
Table of contents:
Debian bookworm live images now fully reproducible from their binary packages
Roland Clobus announced on our mailing list this month that all the major desktop variants (ie. Gnome, KDE, etc.) can be reproducibly created for Debian bullseye, bookworm and trixie from their (pre-compiled) binary packages.
Building reproducible Debian live images does not require building from reproducible source code, but this is still a remarkable achievement. Some large proportion of the binary packages that comprise these live images can (and were) built reproducibly, but live image generation works at a higher level. (By contrast, full or end-to-end reproducibility of a bootable OS image will, in time, require both the compile-the-packages the build-the-bootable-image stages to be reproducible.)
Nevertheless, in response, Roland s announcement generated significant congratulations as well as some discussion regarding the finer points of the terms employed: a full outline of the replies can be found here.
The news was also picked up by Linux Weekly News (LWN) as well as to Hacker News.
LWN: Fedora change aims for 99% package reproducibilityLinux Weekly News (LWN) contributor Joe Brockmeier has published a detailed round-up on how Fedora change aims for 99% package reproducibility. The article opens by mentioning that although Debian has been working toward reproducible builds for more than a decade , the Fedora project has now:
progressed far enough that the project is now considering a change proposal for the Fedora 43 development cycle, expected to be released in October, with a goal of making 99% of Fedora s package builds reproducible. So far, reaction to the proposal seems favorable and focused primarily on how to achieve the goal with minimal pain for packagers rather than whether to attempt it.
Over the last few releases, we [Fedora] changed our build infrastructure to make package builds reproducible. This is enough to reach 90%. The remaining issues need to be fixed in individual packages. After this Change, package builds are expected to be reproducible. Bugs will be filed against packages when an irreproducibility is detected. The goal is to have no fewer than 99% of package builds reproducible.
Python adopts PEP standard for specifying package dependencies
Python developer Brett Cannonreported on Fosstodon that PEP 751 was recently accepted. This design document has the purpose of describing a file format to record Python dependencies for installation reproducibility . As the abstract of the proposal writes:
This PEP proposes a new file format for specifying dependencies to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installers consuming the file should be able to calculate what to install without the need for dependency resolution at install-time.
The PEP, which itself supersedes PEP 665, mentions that there are at least five well-known solutions to this problem in the community .
OSS Rebuild real-time validation and tooling improvements
OSS Rebuild aims to automate rebuilding upstream language packages (e.g. from PyPI, crates.io, npm registries) and publish signed attestations and build definitions for public use.
OSS Rebuild is now attempting rebuilds as packages are published, shortening the time to validating rebuilds and publishing attestations.
Aman Sharma contributed classifiers and fixes for common sources of non-determinism in JAR packages.
Improvements were also made to some of the core tools in the project:
timewarp for simulating the registry responses from sometime in the past.
proxy for transparent interception and logging of network activity.
SimpleX Chat server components now reproducible
SimpleX Chat is a privacy-oriented decentralised messaging platform that eliminates user identifiers and metadata, offers end-to-end encryption and has a unique approach to decentralised identity. Starting from version 6.3, however, Simplex has implemented reproducible builds for its server components. This advancement allows anyone to verify that the binaries distributed by SimpleX match the source code, improving transparency and trustworthiness.
Three new scholarly papers
Aman Sharma of the KTH Royal Institute of Technology of Stockholm, Sweden published a paper on Build and Runtime Integrity for Java (PDF). The paper s abstract notes that Software Supply Chain attacks are increasingly threatening the security of software systems and goes on to compare build- and run-time integrity:
Build-time integrity ensures that the software artifact creation process, from source code to compiled binaries, remains untampered. Runtime integrity, on the other hand, guarantees that the executing application loads and runs only
trusted code, preventing dynamic injection of malicious components.
The recently mandated software bill of materials (SBOM) is intended to help mitigate software supply-chain risk. We discuss extensions that would enable an SBOM to serve as a basis for making trust assessments thus also serving as a proactive defense.
A full PDF of the paper is available.
Lastly, congratulations to Giacomo Benedetti of the University of Genoa for publishing their PhD thesis. Titled Improving Transparency, Trust, and Automation in the Software Supply Chain, Giacomo s thesis:
addresses three critical aspects of the software supply chain to enhance security: transparency, trust, and automation. First, it investigates transparency as a mechanism to empower developers with accurate and complete insights into the software components integrated into their applications. To this end, the thesis introduces SUNSET and PIP-SBOM, leveraging modeling and SBOMs (Software Bill of Materials) as foundational tools for transparency and security. Second, it examines software trust, focusing on the effectiveness of reproducible builds in major ecosystems and proposing solutions to bolster their adoption. Finally, it emphasizes the role of automation in modern software management, particularly in ensuring user safety and application reliability. This includes developing a tool for automated security testing of GitHub Actions and analyzing the permission models of prominent platforms like GitHub, GitLab, and BitBucket.
Debian developer Simon Josefsson published two reproducibility-related blog posts this month. The first was on the topic of Reproducible Software Releases which discusses some techniques and gotchas that can be encountered when generating reproducible source packages ie. ensuring that the source code archives that open-source software projects release can be reproduced by others. Simon s second post builds on his earlier experiments with reproducing parts of Trisquel/Debian. Titled On Binary Distribution Rebuilds, it discusses potential methods to bootstrap a binary distribution like Debian from some other bootstrappable environment like Guix.
Jochen Sprickerhof uploaded sbuild version 0.88.5 with a change relevant to reproducible builds: specifically, the build_as_root_when_needed functionality still supports older versions of dpkg(1). []
The IzzyOnDroid Android APK repository reached another milestone in March, crossing the 40% coverage mark specifically, more than 42% of the apps in the repository is now reproducible
Thanks to funding by NLnet/Mobifree, the project was also to put more
time into their tooling. For instance, developers can now run easily their own verification builder in less than 5 minutes . This currently supports Debian-based systems, but support for RPM-based systems is incoming. Future work in the pipeline, including documentation, guidelines and helpers for debugging.
Fedora developer Zbigniew J drzejewski-Szmek announced a work-in-progress script called fedora-repro-build which attempts to reproduce an existing package within a Koji build environment. Although the project s README file lists a number of fields will always or almost always vary (and there are a non-zero list of other known issues), this is an excellent first step towards full Fedora reproducibility (see above for more information).
Lastly, in openSUSE news, Bernhard M. Wiedemann posted another monthly update for his work there.
[What] would it take to compromise an entire Linux distribution directly through their public infrastructure? Is it possible to perform such a compromise as simple security researchers with no available resources but time?
diffoscope & strip-nondeterminismdiffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 290, 291, 292 and 293 and 293 to Debian:
Bug fixes:
file(1) version 5.46 now returns XHTML document for .xhtml files such as those found nested within our .epub tests. []
Also consider .aar files as APK files, at least for the sake of diffoscope. []
Require the new, upcoming, version of file(1) and update our quine-related testcase. []
Codebase improvements:
Ensure all calls to our_check_output in the ELF comparator have the potential CalledProcessError exception caught. [][]
Correct an import masking issue. []
Add a missing subprocess import. []
Reformat openssl.py. []
Update copyright years. [][][]
In addition, Ivan Trubach contributed a change to ignore the st_size metadata entry for directories as it is essentially arbitrary and introduces unnecessary or even spurious changes. []
Website updates
Once again, there were a number of improvements made to our website this month, including:
Herv Boutemy updated the JVM documentation to clarify that the target is rebuild attestation. []
Lastly, Holger Levsen added Julien Malka and Zbigniew J drzejewski-Szmek to our Involved people [][] as well as replaced suggestions to follow us on Twitter/X to follow us on Mastodon instead [][].
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, a number of changes were made by Holger Levsen, including:
And finally, node maintenance was performed by Holger Levsen [][][] and Mattia Rizzolo [][].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Preparing for Trixie, by Rapha l Hertzog
As we are approaching the trixie freeze, it is customary for Debian developers
to review their packages and clean them up in preparation for the next stable
release.
That s precisely what Rapha l did with
publican, a package that had not seen
any change since the last Debian release and that partially stopped working
along the way due to a major Perl upgrade. While upstream s activity is close to
zero, hope is not yet entirely gone as the git repository moved to a new
location a couple of months ago and
contained the required fix. Rapha l also developed another fix to avoid an
annoying warning that was seen at runtime.
Rapha l also ensured that the last upstream version of zim was uploaded to
Debian unstable, and developed a fix for gnome-shell-extension-hamster to make
it work with GNOME 48
and thus ensure that the package does not get removed from trixie.
Abseil and re2 transition in Debian, by Stefano Rivera
One of the last transitions to happen for trixie was an update to
abseil, bringing it up to 202407. This library
is a dependency for one of Freexian s customers, as well as blocking newer
versions of re2, a package maintained by Stefano.
The transition had been stalled for several months while some issues with
reverse dependencies were investigated and dealt with. It took a final push to
make the transition happen, including fixing a few newly discovered problems
downstream. The abseil package s autopkgtests were (trivially) broken by newer
cmake versions, and some tests started failing on PPC64 (a known issue
upstream).
debvm uploaded, by Helmut Grohne
debvm is a command line tool for
quickly creating a Debian-based virtual machine for testing purposes. Over time,
it accumulated quite a few minor issues as well as CI failures. The most
notorious one was an ARM32 failure present since August. It was diagnosed down
to a glibc bug by Tj and Chris Hofstaedtler
and little has happened since then. To have debvm work somewhat, it now
contains a workaround for this situation. Few changes are expected to be
noticeable, but related tools such as apt, file, linux, passwd, and
qemu required quite a few adaptations all over the place. Much of the
necessary debugging was contributed by others.
DebConf 25 Registration website, by Stefano Rivera and Santiago Ruano Rinc n
DebConf 25, the annual Debian developer conference, is now open for
registration.
Other than preparing the conference
website, getting
there always requires some last minute changes to the
software
behind the registration interface and this year was no exception. Every year,
the conference is a little different to previous years, and has some different
details that need to be captured from attendees. And every year we make minor
incremental improvements to fix long-standing problems.
New concepts this year included: brunch, the closing talks on the departure day,
venue security clearance, partial contributions towards food and accommodation
bursaries, and attendee-selected bursary budgets.
Miscellaneous contributions
Helmut uploaded
guess-concurrency incorporating
feedback from others.
Helmut reacted to
rebootstrap CI results and
adapted it to cope with changes in unstable.
Helmut researched real world /usr-move fallout though little was actually
attributable. He also NMUed systemd unsuccessfully.
Helmut sent 12 cross build patches.
Helmut looked into undeclared file conflicts in Debian more systematically and
filed quite some
bugs.
Lucas worked on the CFP and tracks definition for DebConf 25.
Lucas worked on some bits involving Rails 7 transition.
Carles investigated why the job piuparts on salsa-ci/pipeline was passing but
was failing on piuparts.debian.org for simplemonitor package. Created an
issue and MR
with a suggested fix, under discussion.
Carles improved the documentation of salsa-ci/pipeline: added documentation
for different variables.
Carles made debian-history package reproducible (with help from Chris Lamb).
Carles updated simplemonitor package (new upstream version), prepared a new
qdacco version (fixed bugs in qdacco, packaged with the upgrade from Qt 5 to Qt
6).
Carles reviewed and submitted translations to Catalan for adduser, apt,
shadow, apt-listchanges.
Carles reviewed, created merge-requests for translations to Catalan of 38
packages (using
po-debconf-manager
tooling). Created 40 bug reports for some merge requests that haven t been
actioned for some time.
Colin Watson fixed 59 RC bugs (including 26 packages broken by the
long-overdue removal of dh-python s dependency on python3-setuptools), and
upgraded 38 packages (mostly Python-related) to new upstream versions.
Stefano bisected and fixed a pypy
translation regression on Debian stable and older on 32-bit ARM.
Emilio coordinated and helped finish various transitions in light of the
transition freeze.
Thorsten Alteholz uploaded cups-filters to fix an FTBFS with a new upstream
version of qpdf.
With the aim of enhancing the support for packages related to Software Bill of
Materials (SBOMs) in recent industrial standards, Santiago has worked on
finishing the packaging of and uploaded CycloneDX python
library. There is on-going
work about SPDX python tools, but it
requires (build-)dependencies currently not shipped in Debian, such as
owlrl and
pyshacl.
Anupa worked with the Publicity team to announce the Debian 12.10 point
release.
Anupa with the support of Santiago prepared an announcement and announced the
opening of CfP and Registrations for DebConf 25.
As you might know I'm not much of an Android user (let alone
developer) but in order to figure out how something low level works
you sometimes need to peek at how vendor kernels handles this. For
that it is often useful to add additional debugging.
One such case is QMI communication going on in Qualcomm SOCs. Joel
Selvaraj wrote some nice
tooling
for this.
To make use of this a rooted device and a small kernel patch is needed
and what would be a no-brainer with Linux Mobile took me a moment to
get it to work on Android. Here's the steps I took on a Pixel 3a to
first root the device via Magisk, then build the patched kernel and
put that into a boot.img to boot it.
Flashing the factory image
If you still have Android on the device you can skip this step.
You can get Android 12 from
developers.google.com. I've
downloaded sargo-sp2a.220505.008-factory-071e368a.zip. Then put the
device into Fastboot mode (Power + Vol-Down), connect it to your
PC via USB, unzip/unpack the archive and reflash the phone:
This wipes your device! I had to run it twice since it would time out
on the first run. Note that this unpacked zip contains another zip
(image-sargo-sp2a.220505.008.zip) which will become useful below.
Enabling USB debugging
Now boot Android and enable Developer mode by going to SettingsAbout then touching Build Number (at the very bottom) 7 times.
Go back one level, then go to SystemDeveloper Options and
enable "USB Debugging".
Obtaining boot.img
There are several ways to get boot.img. If you just flashed Android
above then you can fetch boot.img from the already mentioned
image-sargo-sp2a.220505.008.zip:
unzip image-sargo-sp2a.220505.008.zip boot.img
If you want to fetch the exact boot.img from your device you can use
TWRP (see the very end of this post).
Becoming root with Magisk
Being able to su via adb will later be useful to fetch kernel
logs. For that we first download Magisk as APK. At the time of writing
v28.1 is
current.
Once downloaded we upload the APK and the boot.img from the previous
step onto the phone (which needs to have Android booted):
In Android open the Files app, navigate to /sdcard/Download and
install the Magisk APK by opening the APK.
We now want to patch boot.img to get su via adb to work (so we
can run dmesg). This happens by hitting Install in the Magisk app,
then "Select a file to patch". You then select the boot.img we just
uploaded.
The installation process will create a magisk_patched-<random>.img
in /sdcard/Download. We can pull that file via adb back to our
PC:
Now boot the phone again, open the Magisk app, go to SuperUser at
the bottom and enable Shell.
If you now connect to your phone via adb again and now su should
work:
adb shell
su
As noted above if you want to keep your Android installation pristine
you don't even need to flash this Magisk enabled boot.img. I've flashed
it so I have su access for other operations too. If you don't want
to flash it you can still test boot it via:
fastboot boot magisk_patched-28100_3ucVs.img
and then perform the same adb shell su check as above.
Building the custom kernel
For our QMI debugging to work we need to patch the kernel a bit and
place that in boot.img too. So let's build the kernel first. For
that we install the necessary tools (which are thankfully packaged in
Debian) and fetch the Android kernel sources:
With that we can apply Joel's kernel patches and also compile in the touch controller
driver so we don't need to worry if the modules in the initramfs match the kernel. The kernel
sources are in private/msm-google. I've just applied the diffs on top with patch and modified
the defconfig and committed the changes. The resulting tree is
here.
We then build the kernel:
PATH=/usr/sbin:$PATH ./build_bonito.sh
The resulting kernel is at
./out/android-msm-pixel-4.9/private/msm-google/arch/arm64/boot/Image.lz4-dtb.
In order to boot that kernel I found it to be the simplest to just
replace the kernel in the Magisk patched boot.img as we have that
already. In case you have already deleted that for any reason we can
always fetch the current boot.img from the phone via TWRP (see below).
Preparing a new boot.img
To replace the kernel in our Magisk enabled
magisk_patched-28100_3ucVs.img from above with the just built kernel
we can use mkbootimgfor that. I basically copied the steps we're
using when building the boot.img on the Linux Mobile side:
This will give you a boot.patched.img with the just built kernel.
Boot the new kernel via fastboot
We can now boot the new boot.patched.img. No need to flash that onto
the device for that:
fastboot boot boot.patched.img
Fetching the kernel logs
With that we can fetch the kernel logs with the debug output via adb:
adb shell su -c 'dmesg -t' > dmesg_dump.xml
or already filtering out the QMI commands:
adb shell su -c 'dmesg -t' grep "@QMI@" sed -e "s/@QMI@//g" &> sargo_qmi_dump.xml
That's it. You can apply this method for testing out other kernel
patches as well. If you want to apply the above to other devices you basically
need to make sure you patch the right kernel sources, the other steps should be
very similar.
In case you just need a rooted boot.img for sargo you can find a
patched one here.
If this procedure can be improved / streamlined somehow please
let me know.
Appendix: Fetching boot.img from the phone
If, for some reason you lost boot.img somewhere on the way you can always use
TWRP to fetch the boot.img currently in use on your phone.
First get TWRP for the Pixel 3a. You can
boot that directly by putting your device into fastboot mode, then running:
fastboot boot twrp-3.7.1_12-1-sargo.img
Within TWRP select BackupBoot and backup the file. You can then use adb shell
to locate the backup in /sdcard/TWRP/BACKUPS/ and pull it:
Most of my Debian contributions this month were
sponsored by Freexian.
You can also support my work directly via
Liberapay.
OpenSSH
Changes in dropbear 2025.87 broke OpenSSH s regression
tests. I cherry-picked the fix.
I reviewed and merged patches from Luca
Boccassi to
send and accept the COLORTERM and NO_COLOR environment variables.
Python team
Following up on last month, I fixed some
more uscan errors:
python-ewokscore
python-ewoksdask
python-ewoksdata
python-ewoksorange
python-ewoksutils
python-processview
python-rsyncmanager
I upgraded these packages to new upstream versions:
python-mastodon (in the course of which I found
#1101140 in blurhash-python and
proposed a small
cleanup to slidge)
python-model-bakery
python-multidict
python-pip
python-rsyncmanager
python-service-identity
python-setproctitle
python-telethon
python-trio
python-typing-extensions
responses
setuptools-scm
trove-classifiers
zope.testrunner
In bookworm-backports, I updated python-django to 3:4.2.19-1.
Although Debian s upgrade to python-click 8.2.0 was
reverted for the time being, I fixed a
number of related problems anyway since we re going to have to deal with it eventually:
dh-python dropped its dependency on python3-setuptools in 6.20250306, which
was long overdue, but it had quite a bit of fallout; in most cases this was
simply a question of adding build-dependencies on python3-setuptools, but in
a few cases there was a missing build-dependency on
python3-typing-extensions which had previously been pulled in as a
dependency of python3-setuptools. I fixed these bugs resulting from this:
We agreed to remove python-pytest-flake8.
In support of this, I removed unnecessary build-dependencies from
pytest-pylint, python-proton-core, python-pyzipper, python-tatsu,
python-tatsu-lts, and python-tinycss, and filed #1101178 on
eccodes-python and #1101179 on
rpmlint.
There was a dnspython autopkgtest regression on
s390x. I independently tracked that down
to a pylsqpack bug and came up with a reduced test case before realizing
that Pranav P had already been working on it; we then worked together on it
and I uploaded their patch to Debian.
I fixed various other build/test failures:
Another short status update of what happened on my side last
month. Some more ModemManager bits landed, Phosh
0.46 is out, haptic feedback
is now better tunable plus some more. See below for details (no April 1st
joke in there, I promise):
phosh