I ve been using
rsnapshot to take backups of around 10 servers and laptops for well over 15 years, and it is a remarkably reliable tool that has proven itself many times. Rsnapshot uses
rsync over
SSH and maintains a temporal hard-link file pool. Once rsnapshot is configured and running, on the backup server, you get a hardlink farm with directories like this for the remote server:
/backup/serverA.domain/.sync/foo
/backup/serverA.domain/daily.0/foo
/backup/serverA.domain/daily.1/foo
/backup/serverA.domain/daily.2/foo
...
/backup/serverA.domain/daily.6/foo
/backup/serverA.domain/weekly.0/foo
/backup/serverA.domain/weekly.1/foo
...
/backup/serverA.domain/monthly.0/foo
/backup/serverA.domain/monthly.1/foo
...
/backup/serverA.domain/yearly.0/foo
I can browse and rescue files easily, going back in time when needed.
The
rsnapshot project README explains more, there is a long
rsnapshot HOWTO although I usually find the
rsnapshot man page the easiest to digest.
I have
stored multi-TB Git-LFS data on GitLab.com for some time. The yearly renewal is coming up, and the price for Git-LFS storage on GitLab.com is now excessive (~$10.000/year). I have reworked my work-flow and finally migrated
debdistget to only store Git-LFS stubs on GitLab.com and push the real files to S3 object storage. The cost for this is barely measurable, I have yet to run into the 25/month warning threshold.
But how do you backup stuff stored in S3?
For some time, my S3 backup solution has been to run the
minio-client mirror command to download all S3 objects to my laptop, and rely on rsnapshot to keep backups of this. While 4TB NVME s are relatively cheap, I ve felt that this disk and network churn on my laptop is unsatisfactory for quite some time.
What is a better approach?
I find S3 hosting sites fairly unreliable by design. Only a couple of clicks in your web browser and you have dropped 100TB of data. Or by someone else who steal your plaintext-equivalent cookie. Thus, I haven t really felt comfortable using any S3-based backup option. I prefer to self-host, although continously running a mirror job is not sufficient: if I accidentally drop the entire S3 object store, my mirror run will remove all files locally too.
The rsnapshot approach that allows going back in time and having data on self-managed servers feels superior to me.
What if we could use rsnapshot with a S3 client instead of rsync?
Someone else
asked about this several years ago, and the suggestion was to use the fuse-based
s3fs which sounded unreliable to me. After some experimentation, working around some hard-coded assumption in the
rsnapshot implementation, I came up with a small configuration pattern and a wrapper tool to implement what I desired.
Here is my configuration snippet:
cmd_rsync /backup/s3/s3rsync
rsync_short_args -Q
rsync_long_args --json --remove
lockfile /backup/s3/rsnapshot.pid
snapshot_root /backup/s3
backup s3:://hetzner/debdistget-gnuinos ./debdistget-gnuinos
backup s3:://hetzner/debdistget-tacos ./debdistget-tacos
backup s3:://hetzner/debdistget-diffos ./debdistget-diffos
backup s3:://hetzner/debdistget-pureos ./debdistget-pureos
backup s3:://hetzner/debdistget-kali ./debdistget-kali
backup s3:://hetzner/debdistget-devuan ./debdistget-devuan
backup s3:://hetzner/debdistget-trisquel ./debdistget-trisquel
backup s3:://hetzner/debdistget-debian ./debdistget-debian
The idea is to save a backup of a couple of S3 buckets under
/backup/s3/.
I have some scripts that take a complete
rsnapshot.conf file and append my per-directory configuration so that this becomes a complete configuration. If you are curious how I roll this,
backup-all invokes
backup-one appending my
rsnapshot.conf template with the snippet above.
The
s3rsync wrapper script is the essential hack to convert rsnapshot s rsync parameters into something that talks S3 and the script is as follows:
#!/bin/sh
set -eu
S3ARG=
for ARG in "$@"; do
case $ARG in
s3:://*) S3ARG="$S3ARG "$(echo $ARG sed -e 's,s3:://,,');;
-Q*) ;;
*) S3ARG="$S3ARG $ARG";;
esac
done
echo /backup/s3/mc mirror $S3ARG
exec /backup/s3/mc mirror $S3ARG
It uses the
minio-client tool. I first tried
s3cmd but its
sync command read all files to compute MD5 checksums every time you invoke it, which is very slow. The
mc mirror command is blazingly fast since it only compare mtime s, just like
rsync or
git.
First you need to store credentials for your S3 bucket. These are stored in plaintext in
~/.mc/config.json which I find to be sloppy security practices, but I don t know of any better way to do this. Replace
AKEY and
SKEY with your access token and secret token from your S3 provider:
/backup/s3/mc alias set hetzner AKEY SKEY
If I invoke a
sync job for a fully synced up directory the output looks like this:
root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V sync
Setting locale to POSIX "C"
echo 1443 > /backup/s3/rsnapshot.pid
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-gnuinos \
/backup/s3/.sync//debdistget-gnuinos
/backup/s3/mc mirror --json --remove hetzner/debdistget-gnuinos /backup/s3/.sync//debdistget-gnuinos
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-tacos \
/backup/s3/.sync//debdistget-tacos
/backup/s3/mc mirror --json --remove hetzner/debdistget-tacos /backup/s3/.sync//debdistget-tacos
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-diffos \
/backup/s3/.sync//debdistget-diffos
/backup/s3/mc mirror --json --remove hetzner/debdistget-diffos /backup/s3/.sync//debdistget-diffos
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-pureos \
/backup/s3/.sync//debdistget-pureos
/backup/s3/mc mirror --json --remove hetzner/debdistget-pureos /backup/s3/.sync//debdistget-pureos
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-kali \
/backup/s3/.sync//debdistget-kali
/backup/s3/mc mirror --json --remove hetzner/debdistget-kali /backup/s3/.sync//debdistget-kali
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-devuan \
/backup/s3/.sync//debdistget-devuan
/backup/s3/mc mirror --json --remove hetzner/debdistget-devuan /backup/s3/.sync//debdistget-devuan
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-trisquel \
/backup/s3/.sync//debdistget-trisquel
/backup/s3/mc mirror --json --remove hetzner/debdistget-trisquel /backup/s3/.sync//debdistget-trisquel
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-debian \
/backup/s3/.sync//debdistget-debian
/backup/s3/mc mirror --json --remove hetzner/debdistget-debian /backup/s3/.sync//debdistget-debian
"status":"success","total":0,"transferred":0,"duration":0,"speed":0
touch /backup/s3/.sync/
rm -f /backup/s3/rsnapshot.pid
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1443] \
/run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
-V sync: completed successfully
root@hamster /backup#
You can tell from the paths that this machine runs Guix. This was the first production use of the Guix System for me, and the machine has been running since 2015 (with the occasional new hard drive). Before, I used rsnapshot on Debian, but some stable release of Debian dropped the rsnapshot package, paving the way for me to test Guix in production on a non-Internet exposed machine. Unfortunately,
mc is not packaged in Guix, so you will have to install it from the
MinIO Client GitHub page manually.
Running the daily rotation looks like this:
root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V daily
Setting locale to POSIX "C"
echo 1549 > /backup/s3/rsnapshot.pid
mv /backup/s3/daily.5/ /backup/s3/daily.6/
mv /backup/s3/daily.4/ /backup/s3/daily.5/
mv /backup/s3/daily.3/ /backup/s3/daily.4/
mv /backup/s3/daily.2/ /backup/s3/daily.3/
mv /backup/s3/daily.1/ /backup/s3/daily.2/
mv /backup/s3/daily.0/ /backup/s3/daily.1/
/run/current-system/profile/bin/cp -al /backup/s3/.sync /backup/s3/daily.0
rm -f /backup/s3/rsnapshot.pid
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1549] \
/run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
-V daily: completed successfully
root@hamster /backup#
Hopefully you will feel inspired to take backups of your S3 buckets now!