I'd thought that Duplicati was going to serve my needs well, but it turns out that, as typical, things are a bit more... complicated. I hit something similar to this issue (Unexpected difference in fileset version X: found Y entries, but expected Z), but on a fresh backup, even after repeated attempts. So, attempting to recreate the database, it ended up in a stalled state for longer than the backup itself took, similar to this issue. As you can imagine, an unreliable backup system isn't actually a backup system, so I went hunting for something else. This time through, I decided I wasn't going to worry quite as much about the onsite backup (shame on me) and decided to back straight up into the cloud.
Why skip the local backup? Well, because the previous method, although secure, doesn't lend itself well to restores, since separate systems handle the backups versus the encryption As a result, to be able to restore a file, I would need to know the "when" of the file, then restore the entire backup for that system at that time, then mount that backup to be able to find the file, rather than being able to grab the one file I want. Not being able to see the names of files being restored can be quite painful. Having access to considerably more storage allows for a single system to perform both, while still being secure.
Storage
But how to get considerably more storage? In my case, I started using Microsoft 365, so would it be possible to mount a OneDrive drive in Linux? As it turns out: yes, albeit with caveats. Using rclone, it's possible to mount different cloud storage providers, including OneDrive. Installing it is as simple as you would expect:
To set up the connection, follow the appropriate instructions for your service on the documentation page. Pay attention to any additional notes for your service (for example, for Microsoft OneDrive, the notes regarding versioning and how to disable it).
The difference here is that OneDrive is then mounted, so that the storage is streamed on an as-needed basis and is completely available. rclone doesn't have built-in fusermount support, though, so follow the instructions here to create /usr/local/bin/rclonefs. To mount on-boot, using the systemd approach is more reliable than the fstab approach, since it's possible to have the mount wait on network access.
There are a couple of caveats (i.e. things that don't work) about this approach:
- There is no file ownership - similar to SMB, all files are owned by a single user.
- There are no file permissions.
- There are no symlinks or hardlinks.
- 0-byte files can be deleted, but cannot be edited.
These have an impact on the options for backup software.
Backups
I found that, after searching through several different options, the one that worked best for me is restic. Several don't play nice with rclone mount due to symlinks/hardlinks (BackupPC, UrBackup) or file permissions (restic via sftp), and many rely on the server for encryption, meaning that compromising the server means that all data is compromised (BackupPC, UrBackup). Some of them are designed to fundamentally work against tape drives and not disk drives, leading to other issues (Bacula, Bareos, BURP). Borg Backup and Duplicacy could be options, but hit problems when attempting to secure clients from each other, since setting up per-user sftp chroot jails on top of rclone mount has its own security issues (that of needing to grant the container CAP_SYS_ADMIN, which is... not ideal. This problem does go away if a local backup is also kept, however. Borg Backup is very dependent upon a local cache (meaning that system restores get uglier) and has very limited support for transfer protocols, and Duplicacy has a weird license, but both could potentially work as well, particularly if either a local backup is kept or a transfer protocol other than sftp is used (in the case of Duplicacy).
For handling cloud storage, I've set up access to restic via its Rest Server, so that all files are owned by the user the daemon runs as (which neatly bypasses a lot of the permissions issues). It allows for partitioning users away from each other, but at the cost of needing yet another set of credentials to juggle. Via sftp, restic attempts to set file permissions to 0700, which doesn't work so well if sftp is set up with separate accounts either. The configuration ends up being fairly straightforward:
kind: PersistentVolume
metadata:
name: backup-pv
labels:
name: backup-pv
spec:
capacity:
storage: <storage>
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: <path>
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- <system>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: backup-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: <storage>
storageClassName: local-storage
selector:
matchLabels:
name: "backup-pv"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: restic
labels:
app: restic
spec:
replicas: 1
selector:
matchLabels:
app: restic
template:
metadata:
labels:
app: restic
spec:
containers:
- name: restic
image: restic/rest-server
env:
- name: OPTIONS
value: "--private-repos"
volumeMounts:
- name: backup-pvc
mountPath: /data
ports:
- containerPort: 8000
volumes:
- name: backup-pvc
persistentVolumeClaim:
claimName: backup-pvc
---
kind: Service
apiVersion: v1
metadata:
name: restic
labels:
app: restic
spec:
selector:
app: restic
ports:
- protocol: TCP
port: 8000
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: restic
labels:
app: restic
spec:
rules:
- host: <hostname>
http:
paths:
- path: /
backend:
serviceName: restic
servicePort: 8000
Local
Once the pod is up and running, add a user entry via:
After this, setting up backups is straightforward. On the client system:
$ restic -r reset:https://<user>:<password>@<hostname>/<user> init
This will initialize the backup bucket, asking for an encryption password. Make sure to record this password! After this, set up an environment file:
export RESTIC_PASSWORD=<encryption password>
Then, create the backup cron job, ensuring that it's executable:
set -o pipefail
<preparatory commands>
RESTIC_USER=`hostname`
source /root/backup.env
restic backup <directory> [<directory> ...]
# Change to your own preferences, per https://restic.readthedocs.io/en/stable/060_forget.html#removing-snapshots-according-to-a-policy
restic forget --keep-daily=7 --keep-weekly 4 --keep-monthly 12
restic check