Search Results: "Bastian Venthur"

20 April 2024

Bastian Venthur: Help needed: creating a WSDL file to interact with debbugs

I am upstream and Debian package maintainer of python-debianbts, which is a Python library that allows for querying Debian s Bug Tracking System (BTS). python-debianbts is used by reportbug, the standard tool to report bugs in Debian, and therefore the glue between the reportbug and the BTS. debbugs, the software that powers Debian s BTS, provides a SOAP interface for querying the BTS. Unfortunately, SOAP is not a very popular protocol anymore, and I m facing the second migration to another underlying SOAP library as they continue to become unmaintained over time. Zeep, the library I m currently considering, requires a WSDL file in order to work with a SOAP service, however, debbugs does not provide one. Since I m not familiar with WSDL, I need help from someone who can create a WSDL file for debbugs, so I can migrate python-debianbts away from pysimplesoap to zeep. How did we get here? Back in the olden days, reportbug was querying the BTS by parsing its HTML output. While this worked, it tightly coupled the user-facing presentation of the BTS with critical functionality of the bug reporting tool. The setup was fragile, prone to breakage, and did not allow changing anything in the BTS frontend for fear of breaking reportbug itself. In 2007, I started to work on reportbug-ng, a user-friendly alternative to reportbug, targeted at users not comfortable using the command line. Early on, I decided to use the BTS SOAP interface instead of parsing HTML like reportbug did. 2008, I extracted the code that dealt with the BTS into a separate Python library, and after some collaboration with the reportbug maintainers, reportbug adopted python-debianbts in 2011 and has used it ever since. 2015, I was working on porting python-debianbts to Python 3. During that process, it turned out that its major dependency, SoapPy was pretty much unmaintained for years and blocking the Python3 transition. Thanks to the help of Gaetano Guerriero, who ported python-debianbts to pysimplesoap, the migration was unblocked and could proceed. In 2024, almost ten years later, pysimplesoap seems to be unmaintained as well, and I have to look again for alternatives. The most promising one right now seems to be zeep. Unfortunately, zeep requires a WSDL file for working with a SOAP service, which debbugs does not provide. How can you help? reportbug (and thus python-debianbts) is used by thousands of users and I have a certain responsibility to keep things working properly. Since I simply don t know enough about WSDL to create such a file for debbugs myself, I m looking for someone who can help me with this task. If you re familiar with SOAP, WSDL and optionally debbugs, please get in touch with me. I don t speak Pearl, so I m not really able to read debbugs code, but I do know some things about the SOAP requests and replies due to my work on python-debianbts, so I m sure we can work something out. There is a WSDL file for a debbugs version used by GNU, but I don t think it s official and it currently does not work with zeep. It may be a good starting point, though. The future of debbugs API While we can probably continue to support debbugs SOAP interface for a while, I don t think it s very sustainable in the long run. A simpler, well documented REST API that returns JSON seems more appropriate nowadays. The queries and replies that debbugs currently supports are simple enough to design a REST API with JSON around it. The benefit would be less complex libraries on the client side and probably easier maintainability on the server side as well. debbugs maintainer seemed to be in agreement with this idea back in 2018. I created an attempt to define a new API (HTML render), but somehow we got stuck and no progress has been made since then. I m still happy to help shaping such an API for debbugs, but I can t really implement anything in debbugs itself, as it is written in Perl, which I m not familiar with.

26 January 2024

Bastian Venthur: Investigating popularity of Python build backends over time

Inspired by a Mastodon post by Fran oise Conil, who investigated the current popularity of build backends used in pyproject.toml files, I wanted to investigate how the popularity of build backends used in pyproject.toml files evolved over the years since the introduction of PEP-0517 in 2015. Getting the data Tom Forbes provides a huge dataset that contains information about every file within every release uploaded to PyPI. To get the current dataset, we can use:
curl -L --remote-name-all $(curl -L "https://github.com/pypi-data/data/raw/main/links/dataset.txt")
This will download approximately 30GB of parquet files, providing detailed information about each file included in a PyPI upload, including:
  1. project name, version and release date
  2. file path, size and line count
  3. hash of the file
The dataset does not contain the actual files themselves though, more on that in a moment. Querying the dataset using duckdb We can now use duckdb to query the parquet files directly. Let s look into the schema first:
describe select * from '*.parquet';
 
    column_name     column_type    null    
      varchar         varchar     varchar  
 
  project_name      VARCHAR       YES      
  project_version   VARCHAR       YES      
  project_release   VARCHAR       YES      
  uploaded_on       TIMESTAMP     YES      
  path              VARCHAR       YES      
  archive_path      VARCHAR       YES      
  size              UBIGINT       YES      
  hash              BLOB          YES      
  skip_reason       VARCHAR       YES      
  lines             UBIGINT       YES      
  repository        UINTEGER      YES      
 
  11 rows                       6 columns  
 
From all files mentioned in the dataset, we only care about pyproject.toml files that are in the project s root directory. Since we ll still have to download the actual files, we need to get the path and the repository to construct the corresponding URL to the mirror that contains all files in a bunch of huge git repositories. Some files are not available on the mirrors; to skip these, we only take files where the skip_reason is empty. We also care about the timestamp of the upload (uploaded_on) and the hash to avoid processing identical files twice:
select
    path,
    hash,
    uploaded_on,
    repository
from '*.parquet'
where
    skip_reason == '' and
    lower(string_split(path, '/')[-1]) == 'pyproject.toml' and
    len(string_split(path, '/')) == 5
order by uploaded_on desc
This query runs for a few minutes on my laptop and returns ~1.2M rows. Getting the actual files Using the repository and path, we can now construct an URL from which we can fetch the actual file for further processing:
url = f"https://raw.githubusercontent.com/pypi-data/pypi-mirror- repository /code/ path "
We can download the individual pyproject.toml files and parse them to read the build-backend into a dictionary mapping the file-hash to the build backend. Downloads on GitHub are rate-limited, so downloading 1.2M files will take a couple of days. By skipping files with a hash we ve already processed, we can avoid downloading the same file more than once, cutting the required downloads by circa 50%. Results Assuming the data is complete and my analysis is sound, these are the findings: There is a surprising amount of build backends in use, but the overall amount of uploads per build backend decreases quickly, with a long tail of single uploads:
>>> results.backend.value_counts()
backend
setuptools        701550
poetry            380830
hatchling          56917
flit               36223
pdm                11437
maturin             9796
jupyter             1707
mesonpy              625
scikit               556
                   ...
postry                 1
tree                   1
setuptoos              1
neuron                 1
avalon                 1
maturimaturinn         1
jsonpath               1
ha                     1
pyo3                   1
Name: count, Length: 73, dtype: int64
We pick only the top 4 build backends, and group the remaining ones (including PDM and Maturin) into other so they are accounted for as well. The following plot shows the relative distribution of build backends over time. Each bin represents a time span of 28 days. I chose 28 days to reduce visual clutter. Within each bin, the height of the bars corresponds to the relative proportion of uploads during that time interval: Relative distribution of build backends over time Looking at the right side of the plot, we see the current distribution. It confirms Fran oise s findings about the current popularity of build backends: Between 2018 and 2020 the graph exhibits significant fluctuations, due to the relatively low amount uploads utizing pyproject.toml files. During that early period, Flit started as the most popular build backend, but was eventually displaced by Setuptools and Poetry. Between 2020 and 2020, the overall usage of pyproject.toml files increased significantly. By the end of 2022, the share of Setuptools peaked at 70%. After 2020, other build backends experienced a gradual rise in popularity. Amongh these, Hatch emerged as a notable contender, steadily gaining traction and ultimately stabilizing at 10%. We can also look into the absolute distribution of build backends over time: Absolute distribution of build backends over time The plot shows that Setuptools has the strongest growth trajectory, surpassing all other build backends. Poetry and Hatch are growing at a comparable rate, but since Hatch started roughly 4 years after Poetry, it s lagging behind in popularity. Despite not being among the most widely used backends anymore, Flit maintains a steady and consistent growth pattern, indicating its enduring relevance in the Python packaging landscape. The script for downloading and analyzing the data can be found in my GitHub repository. It contains the results of the duckb query (so you don t have to download the full dataset) and the pickled dictionary, mapping the file hashes to the build backends, saving you days for downloading and analyzing the pyproject.toml files yourself.

1 July 2023

Bastian Venthur: dotenv-cli update

Thanks to Nicholas Guriev, dotenv-cli now uses exec instead of popen to create the new process on POSIX systems. As a refresher, dotenv-cli is a package that provides the dotenv command. dotenv reads the .env file from the current directory, puts the contents in the environment variables, and executes the given command with the extra environment variables set. dotenv comes in handy if you follow the 12 factor app methodology or just need to run a program locally with specific environment variables set. With this new change, when you call
dotenv my_awesome_tool
instead of forking a new process for my_awesome_tool, effectively creating a child process of dotenv, dotenv now uses exec to become the new process. This is a bit cleaner, as there is no longer a dotenv-process running that you don t actually care about, and less-error prone when trying to send signals such as SIGTERM to the processes that runs my_awesome_tool. Unfortunately, exec does not work properly under Windows, so here we still fall back to using popen. This new feature is available in version 3.2.0 which is available on PyPI and debian/unstable.

17 June 2023

Bastian Venthur: Blag 2.0 released

A few days ago, I released a major update on blag, my blog-aware static-site generator, which introduces a few backwards-incompatible changes and many improvements over the old version. Good-looking default theme The old bare-bones default theme has been replaced with a good-looking one, based on the one used on this blog: Blag Screenshot It comes with a light- and dark theme that switches automatically based on the browser setting, as well as fitting light- and dark syntax highlighting themes for code blocks. Improved quickstart The blag quickstart command has been improved. Additionally to generating the configuration, it now also populates the working directory with the templates-, static- and content directories, containing the updated default theme and a few content pages to get you started. No internal fallback templates anymore Related to the changes in quickstart, the internal fallback template has been removed, and blag now completely relies on the templates in the local templates directory. This makes it more transparent for the user what is happening while simplifying blag s internal logic. However, this is a backwards incompatible change! In the case of a missing template, the user will be warned with a hint on how to obtain the missing template. Index and archive are now separate Previously, the front-page would always show the archive of all articles. This is not very useful when your blog contains more than a few dozen articles. With blag 2.0, the previous archive has been split into index and archive, where index is the front-page showing only the most recent 15 articles by default and linking to the archive which shows all articles. There s also two corresponding templates in the templates directory. Miscellaneous Blag 2.0 is available on pypi, debian/unstable and github

18 December 2022

Bastian Venthur: The State of Python Packaging in 2022

Every year or so, I revisit the current best practices for Python packaging. This was my summary for 2021 here s the update for 2022. PyPA PyPA is still the place to go for information, best practices and tutorials for packaging Python projects. My only criticism from last year, namely that PyPA was heavily biased towards their own tooling (e.g. pipenv), has been addressed: the tool recommendations section lists now several tools for the same purpose with their own ones not necessarily being the first anymore. setup.py, setup.cfg, requirements.txt, Pipfile, pyproject.toml oh my! This is the reason why I m revisiting the documentation every year, to see what s the current way to go. Good progress has been made since last year: Bye setup.py and setup.cfg hello pyproject.toml pyproject.toml finally got mature enough to replace setup.py and setup.cfg in most cases. Recent versions of setuptools and pip now fully support pyproject.toml and even PyPA s packaging tutorial completely switched their example project from away setup.py towards pyproject.toml, making it an official recommendation. So, now you can replace your setup.py with pyproject.toml. If you had already some kind of declarative configuration in setup.cfg you can move that as well into pyproject.toml. Most tools, like pypy or pytest also support configuration in pyproject.toml (flake8 being a notable exception ) so there s no reason to keep setup.cfg around anymore. Actually, if you migrate to pyproject.toml it is best to do it properly and remove setup.py and setup.cfg as setuptools behaves a bit buggy when building a package that has either of them and the pyproject.toml. requirements.txt requirements.txt are still needed if you develop a deployable application (vs. a library) and want to provide pinned dependencies, i.e. with the specific versions that you ve tested your application with. Usually, the list of requirements in requirements.txt is the same as defined in pyproject.toml, but with pinned versions. Pipfile + Pipfile.lock I still do completely ignore Pipfile and Pipfile.lock as they are only supported by pipenv and not backed by any standard. Summary The major change this year was the proper support of pyproject.toml. I am slowly replacing all setup.py and setup.cfg in my projects with pyproject.toml and haven t discovered any issues yet. Even packaging those packages as Debian packages is well-supported by Debian s tooling. I m still running a quite boring stack based on pyproject.toml and requirements.txt, ignoring more advanced tools like poetry or such for dependency management. My build-system defined in pyproject.toml requires setuptools and I m using build for building and twine for uploading. Since PyPA changed the packaging tutorial towards pyproject.toml and away from setup.py, I think we will slowly see setup.py and setup.cfg go away over the years. Speaking of PyPA, I m happy that they changed their attitude towards a more unbiased recommendation of tooling.

12 November 2022

Bastian Venthur: Mastodon

Due to recent events around Twitter, I finally decided to give Mastodon a try. Naturally, I find the idea of an open and decentralized platform much more appealing than the privately owned walled gardens that became so hugely popular in the past two decades. I m curious whether Mastodon can keep up the momentum of the last two weeks and eventually establish itself as an alternative to Twitter. On that note, I think it will be interesting to see how well moderation of hate speech etc. works- and scales on Mastodon. I believe that Twitter, Facebook and the likes, spend a significant amount of resources on content moderation, so this may become a huge headache for Mastodon instance admins when it gets more popular. Choosing an Instance For no particular reason, I went with the mastodon.social instance, so my handle is @venthur@mastodon.social. After my first steps, I realized that choosing the instance does have an impact, particularly if you follow the local timeline. mastodon.social is currently one of the biggest instances and therefore the local timeline is very busy and the topics naturally very random. Maybe I ll try out a more specialized instance such as fosstodon at some point one of the awesome features of Mastodon is that it actually supports the migration of accounts between instances! I wonder if there are plans to have an official Debian instance? First Impressions So far I m quite happy with Mastodon. There s good quality content and I already found a few people that I was following on Twitter here on Mastodon too. Interestingly, some of them actually are more active on Mastodon than on Twitter. But truth to be told: many of them do cross-post on both mediums, and most of the people I follow on Twitter are not on Mastodon yet. I do like the concepts of the local- and federated timelines, although they are quite busy. I like that you can set the language of your (individual) toots which allows users to filter their timelines for languages. In practice, this does not work so well yet, and I still see a lot of different languages in my local and federated timelines. I assume the problem is that people just forget to set the language of their toots so the default language is used. This problem should be easily solvable in the frontend by guessing the language once a few words have been typed. I also like the idea that you can follow hashtags, although for me the results were mixed. I tried to follow #debian and #python and got a lot of toots that were not really relevant for me in the case of #python I got quite spammed with job ads so I had to unfollow it again. Unfollwing a hashtag is not very intuitive, as the tags are not in your list of people you follow, so you have to find the page of the hashtag itself (e.g. #debian) and unfollow from there. You see, there are some rough edges here and there, but I find the overall experience much more enjoyable than Twitter. Debian Folks in the Fediverse? Which brings me to: are there any Debian Developers or -Maintainers out there to follow in the fediverse? I found most of the ones I m following on Twitter, but I m sure there s more hiding out there.

1 August 2022

Bastian Venthur: Keychron keyboards fixed on Linux

Last year, I wrote about on how to get my buggy Keychron C1 keyboard working properly on Linux by setting a kernel module parameter. Afterwards, I contacted Hans de Goede since he was the last one that contributed a major patch to the relevant kernel module. After some debugging, it turned out that the Keychron keyboards are indeed misbehaving when set to Windows mode. Almost a year later, Bryan Cain provided a patch fixing the behavior, which has now been merged to the Linux kernel in 5.19. Thank you, Hans and Bryan!

18 June 2022

Bastian Venthur: blag is now available in Debian

Last year, I wrote my own blog-aware static site generator in Python. I called it blag named after the blag of the webcomic xkcd. Now I finally got around packaging- and uploading blag to Debian. It passed the NEW queue and is now part of the distribution. That means if you re using Debian, you can install it via:
sudo aptitude install blag
Ubuntu will probably follow soon. For every other system, blag is also available on PyPI:
pip install blag
To get started, you can
mkdir blog && cd blog
blag quickstart                        # fill out some info
nvim content/hello-world.md            # write some content
blag build                             # build the website
Blag is aware of articles and pages: the difference is that articles are part of the blog and will be added to the atom feed, the archive and aggregated in the tag pages. Pages are just rendered out to HTML. Articles and pages can be freely mixed in the content directory, what differentiates an article from a page is the existence of the dade metadata element:
title: My first article
description: Short description of the article
date: 2022-06-18 23:00
tags: blogging, markdown
## Hello World!
Lorem ipsum.
[...]
blag also comes with a dev-server that rebuilds the website automatically on every change detected, you can start it using:
blag serve
The default theme looks quite ugly, and you probably want to create your own styling to make it more beautiful. The process is not very difficult if you re familiar with jinja templating. Help on that can be found in the Templating section of the online documentation, the offline version in the blag-doc package, or the man page, respectively. Speaking of the blag-doc package: packaging it was surprisingly tricky, and it also took me a lot of trial and error to realize that dh_sphinxdocs alone does not automatically put the generated html output into the appropriate package, you rather have to list them in the package.docs-file (i.e. blag-doc.docs) so dh_installdocs can install them properly.

19 December 2021

Bastian Venthur: Managing dotfiles with GNU Stow

Many developers manage their user-specific application configuration also known as dotfiles in a version control system such as git. This allows for keeping track of changes and synchronizing the dotfiles across different machines. Searching on github, you ll find thousands of dotfile repositories. As your dotfiles are sprinkled all over your home directory, managing them in a single repository is not trivial, i.e. how do you make sure that your .bashrc, .tmux.conf, etc. that life in your dotfile repository appear in the proper places in your home directory? The most common solution is to use symlinks so that the .tmux.conf in your home directory is just a symlink pointing to the appropriate file in your dotfile repository:
$ ls -l ~/.tmux.conf
lrwxrwxrwx 1 venthur venthur 34 18. Dez 22:53 /home/venthur/.tmux.conf -> git/dotfiles/tmux/.tmux.conf
This leads immediately to another problem: how do you manage the symlinks? For the longest time I just manually maintained the symlinks on the various machines, but this approach does not scale well with the number of dotfiles and machines you re using this repository on. Often, people write their own shell scripts that help them with the maintenance of the symlinks, but at least the solutions I ve seen so far did not convince me. Last year I stumbled upon GNU Stow, an unpretentious little tool that does not reveal at first sight how useful it would be for the job. The description on the website says:
GNU Stow is a symlink farm manager which takes distinct packages of software and/or data located in separate directories on the filesystem, and makes them appear to be installed in the same place.
Right. How does it work? In stow s terminology, a package is a set of files and directories that need to be installed in a particular directory structure. The target directory is the root of the tree in which the package appear to be installed. When you stow a package, stow creates symlinks in the target directory that point into the package. Let s say I have my dotfiles repository in ~/git/dotfiles/. Within this repository, I have a tmux package, containing the .tmux.conf dotfile:
$ pwd
/home/venthur/git/dotfiles
$ find tmux
tmux                # the package
tmux/.tmux.conf     # the dotfile
The target directory is my home directory, as this is where the symlinks need to be created. I can now stow the tmux package into the target directory like so:
$ stow --target=/home/venthur tmux
and stow will create the appropriate symlinks to the contents of the package into the target directory:
$ ls -l ~/.tmux.conf
lrwxrwxrwx 1 venthur venthur 34  2. Jun 2021  /home/venthur/.tmux.conf -> git/dotfiles/tmux/.tmux.conf
Note that the name of the package (i.e. the name of the directory) does not matter as stow points the symlinks into the package, so you can choose it freely. I usually use the name of the program that the configuration belongs to as the package name. Your package can also contain several files or even a complex directory structure. Let s look at the configuration for neovim, which lives below ~/.config/nvim/:
$ pwd
/home/venthur/git/dotfiles
$ find neovim
neovim
neovim/.config
neovim/.config/nvim
neovim/.config/nvim/init.vim
$ stow --target=/home/venthur neovim
$ ls -l ~/.config/nvim
lrwxrwxrwx 1 venthur venthur 41  2. Jun 2021  /home/venthur/.config/nvim -> ../git/dotfiles/neovim/.config/nvim
At this point we should mention that the target directory for my dotfiles will always be my home directory, so the contents of the packages are either the configuration files or the directory structure as they live in my home directory. Deleting a package from the parent directory You can also remove (unstow) a package from the target directory again, using the --delete parameter:
$ ls -l ~/.tmux.conf
lrwxrwxrwx 1 venthur venthur 34 18. Dez 22:53 /home/venthur/.tmux.conf -> git/dotfiles/tmux/.tmux.conf
$ stow --target=/home/venthur --delete tmux/
$ ls -l ~/.tmux.conf
ls: cannot access '/home/venthur/.tmux.conf': No such file or directory
Stowing several packages at once Since your dotfile repository will likely contain more than one package, it makes sense to combine the individual stow commands into one, so instead of stowing everything individually,
$ stow --target=/home/venthur tmux
$ stow --target=/home/venthur vim
$ stow --target=/home/venthur neovim
you can stow everything at once:
$ stow --target=/home/venthur */
Note that I use */ instead of * to match all directories (i.e. packages), since my dotfiles repository also contains a README.md and a makefile. Putting it all together My dotfiles repository contains a makefile that allows me to create/update or delete all symlinks at once:
all:
        stow --verbose --target=$$HOME --restow */
delete:
        stow --verbose --target=$$HOME --delete */
The --restow parameter tells stow to unstow the packages first before stowing them again, which is useful for pruning obsolete symlinks from the target directory. Et voil ! Whenever I make a change in my dotfiles repository that involves creating or deleting a dotfile (or a package), I simply call:
$ make
and everything is updated. To delete all dotfile-related symlinks from this machine, I simply run:
$ make delete

26 June 2021

Bastian Venthur: The State of Python Packaging in 2021

Every year or so, I revisit the current best practices for Python packaging. I.e. the way you re supposed to distribute your Python packages. The main source is packaging.python.org where the official packaging guidelines are. It is worth noting that the way you re supposed to package your Python applications is not defined by Python or its maintainers, but rather delegated to a separate entity, the Python Packaging Authority (PyPA). PyPA PyPA does an excellent job providing us with information, best practices and tutorials regarding Python packaging. However, there s one thing that irritates me every single time I revisit the page and that is the misleading recommendation of their own tool pipenv. Quoting from the tool recommendations section of the packaging guidelines:
Use Pipenv to manage library dependencies when developing Python applications. See Managing Application Dependencies for more details on using pipenv.
PyPA recommends pipenv as the standard tool for dependency management, at least since 2018. A bold statement, given that pipenv only started in 2017, so the Python community cannot have had not enough time to standardize on the workflow around that tool. There have been no releases of pipenv between 2018-11 and 2020-04, that s 1.5 years for the standard tool. In the past, pipenv also hasn t been shy in pushing breaking changes in a fast-paced manner. PyPA still advertises pipenv all over the place and only mentions poetry a couple of times, although poetry seems to be the more mature product. I understand that pipenv lives under the umbrella of PyPA, but I still expect objectiveness when it comes to tool recommendation. Instead of making such claims, they should provide a list of competing tools and provide a fair feature comparison. Distributions You would expect exactly one distribution for Python packages, but here in Python land, we have several ones. The most popular ones being PyPI the official one and Anaconda. Anaconda is more geared towards data-scientists. The main selling point for Anaconda back then was that it provided pre-compiled binaries. This was especially useful for data-science related packages which depend on libatlas, -lapack, -openblas, etc. and need to be compiled for the target system. This problem has mostly been solved with the wide adoption of wheels, but you still encounter some source-only uploads to PyPI that require you to build stuff locally on pip install. Of course there s also the Python packages distributed by the Operating System, Debian in my case. While I was a firm believer in only using those packages provided by the OS in the very past, I moved to the opposite end of the spectrum throughout the years, and am only using the minimal packages provided by Debian to bootstrap my virtual environments (i.e. pip, setuptools and wheel). The main reason is outdated or missing libraries, which is expected Debian cannot hope to keep up with all the upstream changes in the ecosystem, and that is by design and fine. However, with the recent upgrade of manylinux, even the pip provided by Debian/unstable was too outdated, so you basically had to pip install --upgrade pip for a while otherwise you d end up compiling every package you d try to install via pip. So I m sticking to the official PyPI distribution wherever possible. However, compared to the Debian distribution it feels immature. In my opinion, there should be compiled wheels for all packages available that need it, built and provided by PyPI. Currently, the wheels provided are the ones uploaded by the upstream maintainers. This is not enough, as they usually build wheels only for one platform. Sometimes they don t upload wheels in the first place, relying on the users to compile during install. Then you have manylinux, an excellent idea to create some common ground for a portable Linux build distribution. However, sometimes when a new version of manylinux is released some upstream maintainers immediately start supporting only that version, breaking a lot of systems. A setup similar to Debian s where the authors only do a source-upload and the wheels are compiled on PyPI infrastructure for all available platforms, is probably the way to go. setup.py, setup.cfg, requirements.txt. Pipfile, pyproject.toml oh my! This is the part I m revisiting the documentation every year, to see what s the current way to go. The main point of packaging your Python application is to define the package s meta data and (build-) dependencies. setup.py + requirements.txt For the longest time, the setup.py and requirements.txt were (and, spoiler alert: still is) the backbone of your packaging efforts. In setup.py you define the meta data of your package, including its dependencies. If your project is a deployable application (vs. a library) you ll very often provide an additional requirements.txt with pinned dependencies. Usually the list of requirements is the same as defined in setup.py but with pinned versions. The reason why you avoid version pinning in setup.py is that it would interfere with other pinned dependencies from other dependencies you try to install. setup.cfg setup.cfg is a configuration file that is used by many standard tools in the Python ecosystem. Its format is ini-style and each tools configuration lives in its own stanza. Since 2016 setuptools supports configuring setup() using setup.cfg files. This was exciting news back then, however, it does not completely replace the setup.py file. While you can move most of the setup.py configuration into setup.cfg, you ll still have to provide that file with an empty setup() in order to allow for editable pip installs. In my opinion, that makes this feature useless and I rather stick to setup.py with a properly populated setup() until that file can be completely replaced with something else. Pipfile + Pipflie.lock Pipfile and Pipfile.lock are supposed to replace requirements.txt some day. So far they are not supported by pip or mentioned in any PEP. I think only pipenv supports them, so I d ignore them for now. pyproject.toml PEP 518 introduces the pyproject.toml file as a way to specify build requirements for your project. PEP 621 defines how to store project meta data in it. pip and setuptools support pyproject.toml to some extent, but not to a point where it completely replaces setup.py yet. Many of Python s standard tools allow already for configuration in pyproject.toml so it seems this file will slowly replace the setup.cfg and probably setup.py and requirements.txt as well. But we re not there yet. poetry has an interesting approach: it will allow you to write everything into pyproject.toml and generate a setup.py for you at build-time, so it can be uploaded to PyPI. Ironically, Python settled for the TOML file format here, although there is currently no support for reading TOML files in Python s standard library. Summary While some alternatives exist, in 2021 I still stick to setup.py and requirements.txt to define the meta data and dependencies of my projects. Regarding the tooling, pip and twine are sufficient and do their job just fine. Alternatives like pipenv and poetry exist. The scope of poetry seems to be better aligned with my expectations, and it seems the more mature project compared to pipenv but in any case I ll ignore both of them until I revisit this issue in 2022. Closing Thoughts While the packaging in Python has improved a lot in the last years, I m still somewhat put off how such a core aspect of a programming language is treated within Python. With some jealousy, I look over to the folks at Rust and how they seemed to get this aspect right from the start. What would in my opinion improve the situation?

30 April 2021

Bastian Venthur: Getting the Function keys of a Keychron working on Linux

Having destroyed the third Cherry Stream keyboard in 4 years, I wanted to try a more substantial keyboard for a change. After some research I decided that I want a mechanical, wired, tenkeyless keyboard without any fancy LEDs. At the end I settled for a Keychron C1 with red switches. It meets all requirements, looks very nice and the price is reasonable. Surprise! After the keyboard was delivered, I connected it to my Debian machine and was unpleasantly surprised to notice that the Function-keys did not work at all. The keyboard shares the Multimedia keys with the F-keys and you have an fn key that supposedly switches between the two modes, like you re used to on a laptop. On Linux, however you cannot access the F-keys at all: pressing fn + F1 or F1 makes no difference, you ll always get the Multimedia key. Switching the keyboard between Windows and Mac mode makes no difference either, in both modes the F-keys are not accessible. Apparently Keychron is aware of the problem, because the quick start manual tells you:
We have a Linux user group on facebook. Please search Keychron Linux Group on facebook. So you can better experience with our keyboard.
Customer support at its finest! So at this point you should probably just send the keyboard back, get the refund and buy a proper one with functioning F-keys. The fix Test if this command fixes the issue and enables the Fn + F-key-combos:
# as root:
echo 2 > /sys/module/hid_apple/parameters/fnmode
Depending on the mode the keyboard is in, you should now be able to use the F-keys by simply pressing them, and the Multimedia keys by pressing fn + F-key (or the other way round). To switch the default mode of the F-keys to Function- or Multimedia-mode, press and hold fn + X + L for 4 seconds. If everything works as expected, you can make the change permanent by creating the file /etc/modprobe.d/hid_apple.conf and adding the line:
options hid_apple fnmode=2
This works regardless if the keyboard is in Windows- or Mac-mode, and that might hint at the problem: in both cases the Linux thinks you re connecting a Mac keyboard. The rant Although the fix was not very hard to find and apply, this experience still leaves a foul taste. I naively assumed the problem of having a properly functioning keyboard that simply works when you plug it in, has been thoroughly solved by 2021. To make it worse, I assume Keychron must be aware of the problem because the other Keychron models have the same issue! But instead of fixing it on their end, they forward you to a facebook community and expect you to somehow fix it yourself. So dear Keychron, you do make really beautiful keyboards! But before you release your next model with the same bug, maybe invest a bit on fixing the basics? I see that your keyboards support firmware updates for Windows and Mac maybe you can talk to the folks over at the Linux Vendor Firmware Service and support Linux as well? Maybe you can even fix this annoying bug for the keyboards you ve already sold? I found it really cute that you sent different keycaps for Windows and Mac setups a few disappointed Linux users might accept an apology in form of a Linux cap

31 March 2021

Bastian Venthur: Writing Makefiles for Python Projects

I'm a big fan of Makefiles. Almost all my side projects are using them, and I've been advocating their usage at work too. Makefiles give your contributors an entry point on how to do certain things like, building, testing, deploying. And if done correctly, they can massively simplify your CI/CD pipeline scripts as they can often just stupidly call the respective make targets. Most importantly, they are a very convenient shortcut for you as a developer as well. For Python projects, where I'm almost always using virtual environments, I've been using two different strategies for Makefiles:
  1. assuming that make is executed inside the virtual environment
  2. wrapping all virtual environment calls inside make
Both strategies have their pros and cons. Assuming make is executed inside the venv Let's have a look at a very simple Makefile that allows for building, testing and releasing a Python project:
all: lint test
.PHONY: test
test:
    pytest
.PHONY: lint
lint:
    flake8
.PHONY: release
release:
    python3 setup.py sdist bdist_wheel upload
clean:
    find . -type f -name *.pyc -delete
    find . -type d -name __pycache__ -delete
This is straightforward and a potential contributor immediately knows the entry points to your project. Assuming there is a venv installed already, you have to activate it first and run the make commands afterwards:
$ . venv/bin/activate
$ make test
The downside is of course, that you have to activate the venv for every new shell. Which can get a bit annoying when you spawn a new terminal in tmux or put vim into the background to run make. Activating the venv inside make will not work, as each recipe runs in its own shell, moreover each command in each recipe runs in its own shell too. There are workarounds for the latter, i.e. using the .ONESHELL flag, but that does not solve the issue of a new shell in each recipe. Wrapping the venv calls inside make The second approach mitigates for the issue of activating the venv altogether. I've borrowed this idea mostly from makefile.venv and simplified it a lot for my needs.
# system python interpreter. used only to create virtual environment
PY = python3
VENV = venv
BIN=$(VENV)/bin
# make it work on windows too
ifeq ($(OS), Windows_NT)
    BIN=$(VENV)/Scripts
    PY=python
endif
all: lint test
$(VENV): requirements.txt requirements-dev.txt setup.py
    $(PY) -m venv $(VENV)
    $(BIN)/pip install --upgrade -r requirements.txt
    $(BIN)/pip install --upgrade -r requirements-dev.txt
    $(BIN)/pip install -e .
    touch $(VENV)
.PHONY: test
test: $(VENV)
    $(BIN)/pytest
.PHONY: lint
lint: $(VENV)
    $(BIN)/flake8
.PHONY: release
release: $(VENV)
    $(BIN)/python setup.py sdist bdist_wheel upload
clean:
    rm -rf $(VENV)
    find . -type f -name *.pyc -delete
    find . -type d -name __pycache__ -delete
The equivalent Makefile now looks immediately more complicated. So let's break it down. Instead of calling just calling pytest, flake8 and python assuming the venv is already activated or all dependencies are installed on the system directly, we explicitly call the ones from the venv by prefixing the command with the path to the venv's bin directory. To ensure the venv exists, each of the recipes is depending on the $(VENV) target, which ensures we always have an up-to-date venv installed. This works, because the . venv/bin/activate-script basically just does the same: it puts the venv before anything else in your PATH, therefore each call to python, etc. will find the one installed in the venv first. While the Makefile is a bit more complicated, we now can just call
$ make test
and don't deal with venvs directly any more (well, for those simple cases at least...). If you don't need to support Windows, you can remove the appropriate block and the Makefile looks relatively tame, even for people that don't use make very often. Which one is better? I think the second approach is more convenient. I've used the first approach happily for years and only learned quite recently about the second one. I haven't really noticed any downsides yet, but I do realize that almost all Python projects with Makefiles I've checked out, seem to prefer the first approach. I wonder why that is?

24 March 2021

Bastian Venthur: Perfection is Achieved When There is Nothing Left to Take Away

In 14 years since starting this blog, I changed the software several times, with each iteration simplifying requirements and setup. Wordpress In 2007, I started blogging. Back then I used a self-hosted Wordpress instance, running on my server. I was never really comfortable with the technology stack involved: a programming language I didn't speak, a DB I had little experience with, a relatively heavy setup and of course, the occasional security issues related to that setup. All that just to have a somewhat dynamic website with pingbacks and comments generated by your visitors... totally worth it and exciting times! You wrote something to the "lazyweb" department and people would actually comment with useful advice! Then came the spammers. First, it was just a little, and you could easily fight it off with the askimet plugin. Over the years, however the spam-to-useful-comments-ratio shifted to ridiculous levels and the comment section itself became a burden. Hosting the comments on a separate platform like disqus was not an option for me, so I finally turned the comments off. Without the need for a comment section, the whole idea of storing mostly textual content in a DB just to have it served as a website, suddenly seemed like overkill and an unnecessary security risk. So I started looking for alternatives. I wanted to maintain my content as plain text files. Text files are cool because they are future proof, portable and can be managed in git. Consequently, I was searching for a static site generator. Pelican Around 2016 I finally migrated away from Wordpress to Pelican. Pelican met all my requirements. I could store my content as plain Markdown files in a git repository. Pelican is written in Python, a language with which I'm very comfortable with. Pelican deals with blog posts and "pages" (i.e. non blog articles) and most importantly: Pelican just generates static HTML. So all that's required on the server side is a simple web server. Conveniently, it also comes with an importer that allowed me to export the content of my old Wordpress instance straight into Markdown files. Migrating to Pelican was a huge relieve: I finally had a all my content as plain text in a git repository, my server was a lot more secure without the Wordpress instance running and removing MySQL freed up a lot of resources on that machine. blag Fast forward to 2021. While I was quite happy with Pelican, it still had too many features I don't need. My blogging experience can be stripped down to a bunch of Markdown files that serve either as articles or pages, some static files, some templating for a decent design and an Atom feed. Particularly, I do not need pages/articles in multiple languages, a plugin system, support for multiple authors and tons of settings. So I did what every self-respecting nerd would do: I wrote my own site generator. Not to write something better than Pelican, but rather something lesser. Something that does less while still being functionally adequate. I called the result "blag", named after the blag of the webcomic xkcd. blag is a blog-aware, static site generator written in Python. It supports the basic features you'd expect from a blog, like an archive, tags and an Atom feed. It converts Markdown (with fenced code blocks) and supports Jinja2 templating. With this setup I'm quite happy for now. Maybe sometime there will be a simpler RSS/Atom-successor that does not require site-specific metadata (i.e. only stuff that can be generated from the articles themselves), so I could get rid of blag's only required configuration.

10 February 2021

Bastian Venthur: Installing Debian on a Thinkpad T14s

Recently, I got a new work laptop. I opted for Lenovo's Thinkpad T14s that comes with AMD's Ryzen 7 PRO 4750U Processor. Once everything is installed, it works like a charm: all hardware is supported and works out of the box with Debian. However, the tricky part is actually installing Debian onto that machine: The laptop lacks a standard Ethernet port and comes with Intel's Wi-Fi 6 AX200 module. So if you don't happen to have a docking station or an Ethernet adapter available during install, you'll have to install everything over WiFi. The WiFi module, however requires non-free firmware and this is where the fun starts. First, I downloaded an official netinst image and copied it onto a USB drive. Halfway through the installation, it complained that firmware for the WiFi module was missing, and I was stuck as I couldn't continue the installation without network access. Ok, then -- missing non-free firmware it is. The wiki suggests using an unofficial image instead, as it supposedly contains "all non-free firmware packages directly". So I tried an unofficial netinst image with non-free firmware. That also did not work, with the same error as above: the required firmware was missing. I checked the image later and actually couldn't find the non-free firmware either. Hum. In the end, I had to prepare a second USB drive with the firmware downloaded from here. I unpacked the firmware into /firmware on the second USB. The installer checks at some point during the installation for firmware installed on other removable media and (hopefully) finds it. In my case it did, and I finally could install everything. I'm quite happy with the laptop, but I have to admit how incredibly difficult it still is to install Debian on recent hardware. As a Debian Developer, I do understand Debian's position on non-free firmware, on the other hand however, a less technical person would probably have given up at some point and just installed some other Operating System.

11 January 2021

Bastian Venthur: Dear Apple,

In the light of WhatsApp s recent move to enforce new Privacy Agreements onto its users, alternative messenger services like Signal are currently gaining some more momentum. While this sounds good, it is hard to believe that this will be more than a dent in WhatsApp s user base. WhatsApp is way too ubiquitous, and the whole point of using such a service for most users is to use the one that everyone is using. Unfortunately. Convincing WhatsApp users to additionally install Signal is hard: they already have SMS for the few people that are not using WhatsApp, now expecting them to install a third app for the same purpose seems ridiculous. Android mitigates this problem a lot by allowing to make other apps like Signal the default SMS/MMS app on the phone. Suddenly people are able to use Signal for SMS/MMS and Signal messages transparently. Signal is smart enough to figure out if the conversation partner is using Signal and enables encryption, video calls and other features. If not, it just falls back to plain old SMS. All in the same app, very convenient for the user! I don t really get why the same thing is not possible on iOS? Apple is well known for taking things like privacy and security for its users seriously, and this seems like a low-hanging fruit. So dear Apple, wouldn t now be a good time to team up with WhatsApp-alternatives like Signal to help the users to make the right choice?

29 December 2020

Bastian Venthur: Creating Beautiful Github Streaks

Creating Beautiful Github Streaks Github streaks are a pretty fun and harmless way of visualizing your contributions on github or gitlab. Some people use them to brag a little, some companies use them to check out potential candidates. It s all good as long as everyone is aware how easily you can manipulate them. In order to fake a commit date in the past, you have to set the GIT_AUTHOR_DATE and GIT_COMMITTER_DATE environment variables on git commit both, RFC 2822 and ISO 8601 formats are accepted (for more info see here). When generating hundreds of commits, the --allow-empty flag is very useful, as we don t have to actually change the source tree in order to generate a commit. So the command to generate a fake commit looks like this:
GIT_AUTHOR_DATE="$TIMESTAMP" GIT_COMMITTER_DATE="$TIMESTAMP" git commit --allow-empty -m "$TIMESTAMP"
execute it on an empty git repository and push to github or gitlab and you ll se the result after some time. To automate this, I wrote me a little python script (github, pypi). You can simply install it via:
pip install python-streak
and run it with:
streak
in an empty git repository. By default, streak will create 3 commits per day with a probability of ~71% per commit for 100 years, starting from 1980-05-09. These parameters can be tuned with: Generating a few commits per day for the range of 10 years, took roughly 2 minutes on my computer, and man was I a busy beaver in the 80s! Streak 1989 I know, I m for sure not the first one to discover this kind of thing, but it was a fun afternoon project anyways!

Bastian Venthur: Creating Beautiful Github Streaks

Creating Beautiful Github Streaks Github streaks are a pretty fun and harmless way of visualizing your contributions on github or gitlab. Some people use them to brag a little, some companies use them to check out potential candidates. It s all good as long as everyone is aware how easily you can manipulate them. In order to fake a commit date in the past, you have to set the GIT_AUTHOR_DATE and GIT_COMMITTER_DATE environment variables on git commit both, RFC 2822 and ISO 8601 formats are accepted (for more info see here). When generating hundreds of commits, the --allow-empty flag is very useful, as we don t have to actually change the source tree in order to generate a commit. So the command to generate a fake commit looks like this:
GIT_AUTHOR_DATE="$TIMESTAMP" GIT_COMMITTER_DATE="$TIMESTAMP" git commit --allow-empty -m "$TIMESTAMP"
execute it on an empty git repository and push to github or gitlab and you ll se the result after some time. To automate this, I wrote me a little python script (github, pypi). You can simply install it via:
pip install python-streak
and run it with:
streak
in an empty git repository. By default, streak will create 3 commits per day with a probability of ~71% per commit for 100 years, starting from 1980-05-09. These parameters can be tuned with: Generating a few commits per day for the range of 10 years, took roughly 2 minutes on my computer, and man was I a busy beaver in the 80s! Streak 1989 I know, I m for sure not the first one to discover this kind of thing, but it was a fun afternoon project anyways!

29 February 2020

Chris Lamb: Free software activities in February 2020

Here is my monthly update covering what I have been doing in the free software world during February 2020 (previous month): For the Tails privacy-oriented operating system, I uploaded the following packages to Debian:
Reproducible builds One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the Reproducible Builds effort is to provide the ability to demonstrate these binaries originated from a particular trusted source release: if identical results are generated from a given source in all circumstances, reproducible builds provides the means for multiple third-parties to reach a consensus on whether a build was compromised via distributed checksum validation or some other scheme. The initiative is proud to be a member project of the Software Freedom Conservancy, a not-for-profit 501(c)(3) charity focused on ethical technology and user freedom. Conservancy acts as a corporate umbrella allowing projects to operate as non-profit initiatives without managing their own corporate structure. If you like the work of the Conservancy or the Reproducible Builds project, please consider becoming an official supporter. This month, I: In our tooling, I also made the following changes to diffoscope, our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues, including uploading version 137 to Debian:
Debian I submitted a Request for Package (RFP) bug for hsd, a blockchain-based top-level domain DNS protocol implementation that underpins Handshake and worked on some initial packaging. (#952472)
Debian LTS This month I have worked 18 hours on Debian Long Term Support (LTS) and 12 hours on its sister Extended LTS project. You can find out more about the project via the following video:
Uploads Finally, I made a non-maintainer upload of adminer (4.7.6-1) on behalf of Alexandre Rossi.

13 October 2015

Bastian Venthur: Thanks!

Two months ago I asked for help porting python-debianbts, to Python3. Python-debianbts is a tiny little library that allows for querying Debian s bug tracking system. Since Debian s reportbug depends on it, the library is installed on ca 80% of the Debian installations. The main blocker back then was that python-debianbts depended on SOAPpy which was not available for Python3. So before we could port python-debianbts to Python3 we had to migrate from SOAPpy to pysimplesoap a daunting task given the beast that SOAP is. Fortunately, Gaetano Guerriero heard my call and helped a lot. First he migrated python-debianbts to pysimplesoap. Then, he ported python-debianbts to Python3 and now he is still busy fixing bugs and providing me with pull requests. I m very satisfied with the outcome. His pull requests are very high-quality and come usually with the appropriate unit tests included. While he is doing the major grunt work, I merely do some occasional nitpicking and uploading to Debian/unstable. Thank you very much Gaetano! If you ever come to Berlin, please drop me a note so I can invite you to a beer or two.

18 August 2015

Bastian Venthur: Please Help to Port python-debianbts to Python3

Dear Lazyweb, I m currently trying to find a way to port python-debianbts to Python3. Debian s standard bugreport tool reportbug depends on python-debianbts and can thus not convert to Python3 if python-debianbts does not as well. Unfortunately python-debianbts depends on SoapPy for parsing the Debian bugtracker s responses, and that library is not ported to Python3 yet, and probably never will. I m planning to replace SoapPy with pysimplesoap which is available for Python2 and Python3. Unfortunately debbugs does not support WSDL which makes parsing of the replies extremely painful and error-prone. I wonder if there is a SOAP/Python expert out there who d be willing to give some assistance in porting python-reportbug to pysimplesoap? python-reportbug s repository is on GitHub and patches are very welcome. Since SOAP is quite a beast and debbugs uses it for read-only purposes only, another attractive solution would be to replace/augment debbugs API with something much more simple, like JSON. That would make parsing extremely easy as many programming languages including Python support JSON without any external libraries. In theory this could be quite easy but requires some serious Perl skills.

Next.