Dirk Eddelbuettel: #49: The Two Cultures of Deploying Statistical Software

renv
by Rcpp collaborator and pal Kevin. The expressed hope is
that by nailing down a (sub)set of packages, outcomes are constrained to
be unchanged. Hope springs eternal, clearly. (Personally, if need be, I
do the same with Docker containers and their respective
Dockerfile
.)
On the other hand, rolling is fundamentally different approach. One
(well known) example is Google building everything at @HEAD . The entire (ginormous)
code base is considered as a mono-repo which at any point in
time is expected to be buildable as is. All changes made are pre-tested
to be free of side effects to other parts. This sounds hard, and likely
is more involved than an alternative of a whatever works approach of
independent changes and just hoping for the best.
Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to
a staging place (Debian calls this the unstable distribution) and,
if no side effects are seen, propagated after a fixed number of days to
the rolling distribution (called testing ). With this mechanism,
testing should always be installable too. And based on the rolling
distribution, at certain times (for Debian roughly every two years) a
release is made from testing into stable (following more elaborate
testing). The released stable version is then immutable (apart from
fixes for seriously grave bugs and of course security updates). So this
provides the connection between frequent and rolling updates, and
produces immutable fixed set: a release.
This Debian approach has been influential for any other
projects including CRAN as can
be seen in aspects of its system providing a rolling set of curated
packages. Instead of a staging area for all packages, extensive tests
are made for candidate packages before adding an update. This aims to
ensure quality and consistence and has worked remarkably well. We argue
that it has clearly contributed to the success and renown of CRAN.
Now, when accessing CRAN
from R, we fundamentally have
two accessor functions. But seemingly only one is widely known
and used. In what we may call the Jeff model , everybody is happy to
deploy install.packages()
for initial
installations.
That sentiment is clearly expressed by
this bsky post:
One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package I go check for a new version because invariably it s been updated with 18 new major featuresAnd that is why we have two cultures. Because some of us, yours truly included, also use
update.packages()
at recurring (frequent !!) intervals:
daily or near-daily for me. The goodness and, dare I say, gift of
packages is not limited to those by my pal Vincent. CRAN updates all the time, and
updates are (generally) full of (usually excellent) changes, fixes, or
new features. So update frequently! Doing (many but small) updates
(frequently) is less invasive than (large, infrequent) waterfall -style
changes!
But the fear of change, or disruption, is clearly pervasive. One can
only speculate why. Is the experience of updating so painful on other
operating systems? Is it maybe a lack of exposure / tutorials on best
practices?
These Two Cultures coexist. When I delivered the talk in Mons, I
briefly asked for a show of hands among all the R users in the audience to see who
in fact does use update.packages()
regularly. And maybe a
handful of hands went up: surprisingly few!
Now back to the context of installing packages: Clearly only
installing has its uses. For continuous integration checks we generally
install into ephemeral temporary setups. Some debugging work may be with
one-off container or virtual machine setups. But all other uses may well
be under maintained setups. So consider calling
update.packages()
once in while. Or even weekly or daily.
The rolling feature of CRAN is a real benefit, and it is
there for the taking and enrichment of your statistical computing
experience.
So to sum up, the real power is to use
install.packages()
to obtain fabulous new statistical computing resources, ideally in an instant; andupdate.packages()
to keep these fabulous resources current and free of (known) bugs.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.