Earlier today I watched the presentation
Jonathan Oxer's
gave during LCA about
Package caching solutions.
Although it was certainly an interesting presentation and although I very much
agree that my current local mirror is wasting a lot of diskspace and bandwidth,
I'm still not going to switch from
debmirror
to any of the available
caching solutions, because (unless I'm really missing something) none of them
scratches my itch.
My local mirror currently consists of five architectures (i386, amd64, hppa,
sparc and s390) and only has unstable and testing. I use it for:
- (fast and convenient) updating of my systems
- doing Debian Installer builds and installation tests
- test builds of installation CDs (using
debian-cd
)
Now, the last one is somewhat hard (
debian-cd
uses hardlinks to packages on
a local mirror instead of retrieving packages), so let's concentrate on the
first two.
Caching is great if you have a large number of machines – of the same
architecture and that are all likely to need roughly the same packages –
sitting behind the proxy: the first one triggers the download of the package
and the rest gets it almost instantaneously.
It is a lot less great if you have only one, maybe two systems per architecture:
most of the time you'll still end up going down that (relatively) slow ADSL
connection.
An important reason why I have my mirror is so that when I do my daily updates
for sid or run an installation test, the packages
are already available locally.
I really don't want to double or even treble the time needed for installation
tests just because some required packages aren't yet available locally and need
to be downloaded over that slow line.
So, I have my partial mirror. Somewhat tuned (I exclude some ridiculously large
debug packages for example, saving about 10GB), but still with
a lot of
junk^Wpackages on it I'll never ever use, especially for hppa, sparc and s390
as those systems only have fairly basic installations.
Getting rid of that would significantly reduce my daily sync and allow me,
for example, to also have a mirror of stable and oldstable, keep old versions
of packages and probably
still save a lot of diskspace.
Wishlist
What we should have is a hybrid solution: a program that will present itself
as a proxy to clients, but is smart enough to pre-fetch new packages that are
likely to be needed in a sync run, based on usage date from the the proxy and
configuration settings.
Some ideas for features/configuration options it could have:
- should support both source and binaries (debs and udebs)
- transparently retrieve packages that are requested but not available locally
(just like a regular proxy)
- options to always sync (per architecture):
- all required and standard packages
- packages listed in a certain config file
- packages in a certain section (e.g. udebs in the debian-installer section)
- for packages that have been used (requested by clients) within the last X
days: pre-fetch any new version (per architecture)
- options to include security/volatile updates in that
- be smart about ABI changes: when a package name changes because the ABI
version changes, also pre-fetch the new package; an alternative solution for
that could be:
- for recently used packages: ensure that their dependencies are also pre-fetched;
this would also provide support when packages are renamed (dependencies in the
transition packages would ensure the new packages are pre-fetched)
- expire packages from the mirror that have not been used for X days
- allow faster expiration of certain types of packages (-dev or -dbg for example)
- options to keep X previous versions that are no longer referenced in the
Packages files for Y days
Possibly such an implementation could even be used on some of the lower tier
Debian mirrors.
Unfortunately, unlike some of my esteemed colleagues, I'm not able to just
whip up something like this, so I'm condemned to wait and see if there's
someone else who'd like to pick up this idea. I am of course more than
willing to help develop this idea further and to test it.
Now, if I've totally failed in my research and something like this already
exists, a pointer in the right direction would be much appreciated.