
Every year or so, I revisit the current best practices for Python packaging.
I.e. the way you re supposed to distribute your Python packages. The main
source is
packaging.python.org where the official packaging guidelines
are. It is worth noting that the way you re supposed to package your Python
applications is
not defined by Python or its maintainers, but rather
delegated to a separate entity, the
Python Packaging Authority (PyPA).
PyPA
PyPA does an excellent job providing us with information, best practices and
tutorials regarding Python packaging. However, there s
one thing that
irritates me every single time I revisit the page and that is the misleading
recommendation of their
own tool
pipenv.
Quoting from the
tool recommendations section of the packaging
guidelines:
Use Pipenv to manage library dependencies when developing Python
applications. See Managing Application Dependencies for more details on
using pipenv.
PyPA recommends
pipenv as the
standard tool for dependency
management, at least
since 2018. A bold statement, given that
pipenv only started in 2017, so the Python community cannot have had not
enough time to standardize on the workflow around that tool. There have been
no releases of
pipenv between 2018-11 and 2020-04, that s 1.5 years for the
standard tool. In the past,
pipenv also hasn t been shy in pushing
breaking changes in a fast-paced manner.
PyPA still advertises
pipenv all over the place and only mentions
poetry
a couple of times, although
poetry seems to be the more mature product. I
understand that
pipenv lives under the umbrella of PyPA, but I still expect
objectiveness when it comes to tool recommendation. Instead of making such
claims, they should provide a list of competing tools and provide a fair
feature comparison.
Distributions
You would expect exactly one distribution for Python packages, but here in
Python land, we have several ones. The most popular ones being
PyPI the
official one and
Anaconda. Anaconda is more geared towards
data-scientists. The main selling point for Anaconda back then was that it
provided pre-compiled binaries. This was especially useful for data-science
related packages which depend on libatlas, -lapack, -openblas, etc. and need
to be compiled for the target system. This problem has mostly been solved with
the wide adoption of
wheels, but you still encounter some source-only
uploads to PyPI that require you to build stuff locally on
pip install.
Of course there s also the Python packages distributed by the Operating
System, Debian in my case. While I was a firm believer in
only using those
packages provided by the OS in the very past, I moved to the opposite end of
the spectrum throughout the years, and am
only using the minimal packages
provided by Debian to bootstrap my virtual environments (i.e.
pip,
setuptools and
wheel). The main reason is outdated or missing libraries,
which is expected Debian cannot hope to keep up with all the upstream
changes in the ecosystem, and that is by design and fine. However, with the
recent upgrade of
manylinux, even the
pip provided by Debian/unstable
was too outdated, so you basically had to
pip install --upgrade pip for a
while otherwise you d end up compiling every package you d try to install via
pip.
So I m sticking to the official PyPI distribution wherever possible. However,
compared to the Debian distribution it feels immature. In my opinion, there
should be compiled wheels for all packages available that need it, built and
provided by PyPI. Currently, the wheels provided are the ones uploaded by the
upstream maintainers. This is not enough, as they usually build wheels only
for one platform. Sometimes they don t upload wheels in the first place,
relying on the users to compile during install.
Then you have
manylinux, an excellent idea to create some common ground
for a portable Linux build distribution. However, sometimes when a new version
of manylinux is released some upstream maintainers immediately start
supporting
only that version, breaking a lot of systems.
A setup similar to Debian s where the authors only do a source-upload and the
wheels are compiled on PyPI infrastructure for all available platforms, is
probably the way to go.
setup.py, setup.cfg, requirements.txt. Pipfile, pyproject.toml oh my!
This is the part I m revisiting the documentation every year, to see what s
the current way to go.
The main point of packaging your Python application is to define the package s
meta data and (build-) dependencies.
setup.py + requirements.txt
For the longest time, the
setup.py and
requirements.txt were (and, spoiler
alert: still is) the backbone of your packaging efforts. In
setup.py you
define the meta data of your package, including its dependencies.
If your project is a deployable application (vs. a library) you ll very often
provide an additional
requirements.txt with
pinned dependencies. Usually
the list of requirements is the same as defined in
setup.py but with pinned
versions. The reason why you avoid version pinning in
setup.py is that it
would interfere with other pinned dependencies from other dependencies you try
to install.
setup.cfg
setup.cfg is a configuration file that is used by many standard tools in the
Python ecosystem. Its format is ini-style and each tools configuration lives
in its own stanza. Since 2016
setuptools supports configuring
setup() using
setup.cfg files. This was exciting news back then, however,
it does not completely replace the
setup.py file. While you can move most of
the
setup.py configuration into
setup.cfg, you ll still have to provide
that file with an empty
setup() in order to allow for editable
pip
installs. In my opinion, that makes this feature useless and I rather stick to
setup.py with a properly populated
setup() until that file can be
completely replaced with something else.
Pipfile + Pipflie.lock
Pipfile and Pipfile.lock are supposed to replace
requirements.txt some day. So far they are not supported by
pip or
mentioned in any
PEP. I think only
pipenv supports them, so I d ignore
them for now.
pyproject.toml
PEP 518 introduces the
pyproject.toml file as a way to specify
build requirements for your project.
PEP 621 defines how to store
project meta data in it.
pip and
setuptools support
pyproject.toml to some extent, but not to a
point where it completely replaces
setup.py yet. Many of Python s standard
tools allow already for configuration in
pyproject.toml so it seems this
file will slowly replace the
setup.cfg and probably
setup.py and
requirements.txt as well. But we re not there yet.
poetry has an interesting approach: it will allow you to write everything
into
pyproject.toml and
generate a
setup.py for you at build-time, so it
can be uploaded to PyPI.
Ironically, Python settled for the TOML file format here, although there is
currently no support for reading TOML files in Python s standard library.
Summary
While some alternatives exist, in 2021 I still stick to
setup.py and
requirements.txt to define the meta data and dependencies of my projects.
Regarding the tooling,
pip and
twine are sufficient and do their job just
fine. Alternatives like
pipenv and
poetry exist. The scope of
poetry
seems to be better aligned with my expectations, and it seems the more mature
project compared to
pipenv but in any case I ll ignore both of them until I
revisit this issue in 2022.
Closing Thoughts
While the packaging in Python has improved a lot in the last years, I m still
somewhat put off how such a core aspect of a programming language is treated
within Python. With some jealousy, I look over to the folks at Rust and how
they seemed to get this aspect right from the start.
What would in my opinion improve the situation?
- Source only uploads and building of weels for all platforms on PyPI
infrastructure this way we could have wheels everywhere and remove the
need to compile anything in
pip install
- Standard tooling:
pip has been and still is the tool of choice for
packaging in Python. For some time now, you also need twine in order to
upload your packages. setup.py upload still exists, but hasn t worked for
months on my machines. It would be great to have something that improves the
virtualenv handling and dependency management. We do have some tools with
overlapping use-cases, like poetry and pipenv. pipenv is heavily
advertised and an actual PyPA project, but it feels immature in terms of
scope, release history (and emojis!) compared to poetry. poetry is
gaining a lot of traction, but it is apparently not receiving much love from
PyPA, which brings me to:
- Unbiased tool recommendations. I don t understand why PyPA is trying so hard
to make us believe
pipenv would be the standard tool for Python packaging.
Instead of making such claims, please provide a list of competitors and
provide a fair feature comparison. PyPA is providing great packaging
tutorials and is a valuable source of information around this topic. But
when it comes to the tool recommendations, I do challenge PyPA s
objectiveness.