Gunnar Wolf: Some tips for those who still administer Drupal7-based sites

A bit of history: Drupal at my workplace (and in Debian)
My main day-to-day responsibility in my workplace is, and has been for 20 years,
to take care of the network infrastructure for UNAM s Economics Research
Institute. One of the most visible parts of this
responsibility is to ensure we have a working Web presence, and that it caters
for the needs of our academic community.
I joined the Institute in January
2005. Back
then, our designer pushed static versions of our webpage, completely built in
her computer. This was standard practice at the time, and lasted through some
redesigns,
but I soon started advocating for the adoption of a Content Management
System. After evaluating some alternatives, I recommended adopting
Drupal. It took us quite a bit to do the change: even
though I clearly recall starting work toward adopting it as early as 2006,
according to the Internet Archive, we switched to a Drupal-backed site around
June
2010. We
started using it somewhere in the version 6 s lifecycle.
As for my Debian work, by late 2012 I started getting involved in the
maintenance of the drupal7
package,
and by April 2013 I became its primary maintainer. I kept the drupal7
package
up to date in Debian until 2018; the supported build methods for Drupal 8 are
not compatible with Debian (mainly, bundling third-party libraries and updating
them without coordination with the rest of the ecosystem), so towards the end of
2016, I announced I would not package Drupal 8 for
Debian.
By March 2016, we migrated our main page to Drupal
7. By
then, we already had several other sites for our academics projects, but my
narrative follows our main Web site. I did manage to migrate several Drupal 6
(D6) sites to Drupal 7 (D7); it was quite involved process, never transparent to
the user, and we did have the backlash of long downtimes (or partial downtimes,
with sites half-available only) with many of our users. For our main site, we
took the opportunity to do a complete redesign and deployed a fully new site.
You might note that March 2016 is after the release of D8 (November 2015). I
don t recall many of the specifics for this decision, but if I m not mistaken,
building the new site was a several months long process not only for the
technical work of setting it up, but for the legwork of getting all of the
needed information from the different areas that need to be represented in the
Institute. Not only that: Drupal sites often include tens of contributed themes
and modules; the technological shift the project underwent between its 7 and 8
releases was too deep, and modules took a long time (if at all many themes and
modules were outright dumped) to become available for the new release.
Naturally, the Drupal Foundation wanted to evolve and deprecate the old
codebase. But the pain to migrate from D7 to D8 is too big, and many sites have
remained under version 7 Eight years after D8 s release, almost 40% of Drupal
installs are for version 7,
and a similar proportion runs a currently-supported release (10 or 11). And
while the Drupal Foundation made a great job at providing very-long-term support
for D7, I understand the burden is becoming too much, so close to a year ago
(and after pushing several times the D7, they finally announced support will
finish this upcoming January 5.
Drupal 7 must go!
I found the following usage graphs quite interesting: the usage statistics for
all Drupal versions follows a very
positive slope, peaking around 2014 during the best years of D7, and somewhat
stagnating afterwards, staying since 2015 at the 25000 28000 sites mark (I m
very tempted to copy the graphs, but builtwith s terms of
use are very clear in not allowing it). There is a
sharp drop in the last year I attribute it to the people that are leaving D7
for other technologies after its end-of-life announcement. This becomes clearer
looking only at D7 s usage
statistics: D7 peaks at 15000
installs in 2016 stays there for close to 5 years, and has a sharp drop to under
7500 sites in the span of one year.
D8 has a more regular rise, peak and
fall peaking at ~8500 between 2020
and 2021, and down to close to 2500 for some months already; D9 has a very
brief peak of almost 9000 sites in
2023 and is now close to half of
it. Currently, the Drupal king appears to be
D10, still on a positive slope and
with over 9000 sites. Drupal 11 is still just a blip in builtwith s
radar, with 3 registered sites
as of September 2024 :-
After writing this last paragraph, I came across the statistics found in the
Drupal webpage; the methodology for acquiring its data is completely different:
while builtwith s methodology is their trade secret, you can read more about
how Drupal s data is
gathered
(and agree or disagree with it , but at least you have a page detailing 12
years so far of reported data,
producing the following graph (which can be shared under the CC BY-SA
license ):
This graph is disgregated into minor versions, and I don t want to come up with
yet another graph for it but it supports (most of) the narrative I presented
above although I do miss the recent drop builtwith reported in D7 s
numbers!
And what about Backdrop?
During the D8 release cycle, a group of Drupal developers were not happy with
the depth of the architectural changes that were being adopted, particularly the
transition to the Symfony PHP component framework, and forked the D7 codebase to
create the Backdrop CMS, a modern version of Drupal,
without dropping the known and tested architecture it had. The Backdrop
developers keep working closely together with the Drupal community, and although
its usage numbers are way
smaller than Drupal s, seems to be sustainable and lively. Of course, as I
presented their numbers in the previous section, you can see Backdrop s numbers
in builtwith are way, way
lower.
I have found it to be a very warm and welcoming community, eager to receive new
members. And, thanks to its contributed D2B Migrate
module, I found it is quite easy
to migrate a live site from Drupal 7 to Backdrop.
Migration by playbook!
So Well, I m an academic. And (if it s not obvious to you after reading so
far ), one of the things I must do in my job is to write. So I decided to
write an article to invite my colleagues to consider Backdrop for their D7 sites
in Cuadernos T cnicos Universitarios de la
DGTIC, a young journal in our
university for showcasing technical academical work. And now that my article
got accepted and
published, I m
happy to share it with you of course, if you can read Spanish But anyway
Given I have several sites to migrate, and that I m trying to get my colleagues
to follow suite, I decided to automatize the migration by writing an Ansible
playbook to do the heavy lifting. Of
course, the playbook s users will probably need to tweak it a bit to their
personal needs. I m also far from an Ansible expert, so I m sure there is ample
room fo improvement in my style.
But it works. Quite well, I must add.
But with this size of database
I did stumble across a big pebble, though. I am working on the migration of one
of my users sites, and found that its database is huge. I checked the
mysqldump
output, and it got me close to 3GB of data. And given the
D2B_migrate
is meant to work via a Web interface (my playbook works around it
by using a client I wrote with Perl s
WWW::Mechanize), I repeatedly
stumbled with PHP s maximum POST size, maximum upload size, maximum memory
size
I asked for help in Backdrop s Zulip chat
site,
and my attention was taken off fixing PHP to something more obvious: Why is the
database so large? So I took a quick look at the database (or rather: my first
look was at the database server s filesystem usage). MariaDB stores each table
as a separate file on disk, so I looked for the nine largest tables:
# ls -lhS head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec 2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec 6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql 92M Dec 9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql 80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql 72M Dec 2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql 68M Dec 2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql 60M Dec 9 13:19 cache_menu.ibd
A single table, the access log, is over 2.4GB long. The three following tables
are, cache tables. I can perfectly live without their data in our new site! But
I don t want to touch the slightest bit of this site until I m satisfied with
the migration process, so I found a way to exclude those tables in a
non-destructive way: given D2B_migrate
works with a mysqldump
output, and
given that mysqldump
locks each table before starting to modify it and unlocks
it after its job is done, I can just do the following:
$ perl -e '$output = 1; while (<>) $output=0 if /^LOCK TABLES (accesslog search_index watchdog cache_field cache_path) /; $output=1 if /^UNLOCK TABLES/; print if $output ' < /tmp/d7_backup.sql > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec 6 18:14 /tmp/d7_backup.sql
Five seconds later, I m done! The database is now a tenth of its size, and
D2B_migrate
is happy to take it. And I m a big step closer to finishing my
reliance on (this bit of) legacy code for my highly-visible sites

And what about Backdrop?
During the D8 release cycle, a group of Drupal developers were not happy with
the depth of the architectural changes that were being adopted, particularly the
transition to the Symfony PHP component framework, and forked the D7 codebase to
create the Backdrop CMS, a modern version of Drupal,
without dropping the known and tested architecture it had. The Backdrop
developers keep working closely together with the Drupal community, and although
its usage numbers are way
smaller than Drupal s, seems to be sustainable and lively. Of course, as I
presented their numbers in the previous section, you can see Backdrop s numbers
in builtwith are way, way
lower.
I have found it to be a very warm and welcoming community, eager to receive new
members. And, thanks to its contributed D2B Migrate
module, I found it is quite easy
to migrate a live site from Drupal 7 to Backdrop.
Migration by playbook!
So Well, I m an academic. And (if it s not obvious to you after reading so
far ), one of the things I must do in my job is to write. So I decided to
write an article to invite my colleagues to consider Backdrop for their D7 sites
in Cuadernos T cnicos Universitarios de la
DGTIC, a young journal in our
university for showcasing technical academical work. And now that my article
got accepted and
published, I m
happy to share it with you of course, if you can read Spanish But anyway
Given I have several sites to migrate, and that I m trying to get my colleagues
to follow suite, I decided to automatize the migration by writing an Ansible
playbook to do the heavy lifting. Of
course, the playbook s users will probably need to tweak it a bit to their
personal needs. I m also far from an Ansible expert, so I m sure there is ample
room fo improvement in my style.
But it works. Quite well, I must add.
But with this size of database
I did stumble across a big pebble, though. I am working on the migration of one
of my users sites, and found that its database is huge. I checked the
mysqldump
output, and it got me close to 3GB of data. And given the
D2B_migrate
is meant to work via a Web interface (my playbook works around it
by using a client I wrote with Perl s
WWW::Mechanize), I repeatedly
stumbled with PHP s maximum POST size, maximum upload size, maximum memory
size
I asked for help in Backdrop s Zulip chat
site,
and my attention was taken off fixing PHP to something more obvious: Why is the
database so large? So I took a quick look at the database (or rather: my first
look was at the database server s filesystem usage). MariaDB stores each table
as a separate file on disk, so I looked for the nine largest tables:
# ls -lhS head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec 2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec 6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql 92M Dec 9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql 80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql 72M Dec 2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql 68M Dec 2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql 60M Dec 9 13:19 cache_menu.ibd
A single table, the access log, is over 2.4GB long. The three following tables
are, cache tables. I can perfectly live without their data in our new site! But
I don t want to touch the slightest bit of this site until I m satisfied with
the migration process, so I found a way to exclude those tables in a
non-destructive way: given D2B_migrate
works with a mysqldump
output, and
given that mysqldump
locks each table before starting to modify it and unlocks
it after its job is done, I can just do the following:
$ perl -e '$output = 1; while (<>) $output=0 if /^LOCK TABLES (accesslog search_index watchdog cache_field cache_path) /; $output=1 if /^UNLOCK TABLES/; print if $output ' < /tmp/d7_backup.sql > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec 6 18:14 /tmp/d7_backup.sql
Five seconds later, I m done! The database is now a tenth of its size, and
D2B_migrate
is happy to take it. And I m a big step closer to finishing my
reliance on (this bit of) legacy code for my highly-visible sites
But with this size of database
I did stumble across a big pebble, though. I am working on the migration of one
of my users sites, and found that its database is huge. I checked the
mysqldump
output, and it got me close to 3GB of data. And given the
D2B_migrate
is meant to work via a Web interface (my playbook works around it
by using a client I wrote with Perl s
WWW::Mechanize), I repeatedly
stumbled with PHP s maximum POST size, maximum upload size, maximum memory
size
I asked for help in Backdrop s Zulip chat
site,
and my attention was taken off fixing PHP to something more obvious: Why is the
database so large? So I took a quick look at the database (or rather: my first
look was at the database server s filesystem usage). MariaDB stores each table
as a separate file on disk, so I looked for the nine largest tables:
# ls -lhS head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec 2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec 6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql 92M Dec 9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql 80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql 72M Dec 2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql 68M Dec 2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql 60M Dec 9 13:19 cache_menu.ibd
A single table, the access log, is over 2.4GB long. The three following tables
are, cache tables. I can perfectly live without their data in our new site! But
I don t want to touch the slightest bit of this site until I m satisfied with
the migration process, so I found a way to exclude those tables in a
non-destructive way: given D2B_migrate
works with a mysqldump
output, and
given that mysqldump
locks each table before starting to modify it and unlocks
it after its job is done, I can just do the following:
$ perl -e '$output = 1; while (<>) $output=0 if /^LOCK TABLES (accesslog search_index watchdog cache_field cache_path) /; $output=1 if /^UNLOCK TABLES/; print if $output ' < /tmp/d7_backup.sql > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec 6 18:14 /tmp/d7_backup.sql
Five seconds later, I m done! The database is now a tenth of its size, and
D2B_migrate
is happy to take it. And I m a big step closer to finishing my
reliance on (this bit of) legacy code for my highly-visible sites
# ls -lhS head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec 2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec 6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql 92M Dec 9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql 80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql 72M Dec 2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql 68M Dec 2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql 60M Dec 9 13:19 cache_menu.ibd
$ perl -e '$output = 1; while (<>) $output=0 if /^LOCK TABLES (accesslog search_index watchdog cache_field cache_path) /; $output=1 if /^UNLOCK TABLES/; print if $output ' < /tmp/d7_backup.sql > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec 6 18:14 /tmp/d7_backup.sql