Back to backups

Last modified by Mitchell on 2022/01/26 03:17

I'd thought that Duplicati was going to serve my needs well, but it turns out that, as typical, things are a bit more... complicated. I hit something similar to this issue (Unexpected difference in fileset version X: found Y entries, but expected Z), but on a fresh backup, even after repeated attempts. So, attempting to recreate the database, it ended up in a stalled state for longer than the backup itself took, similar to this issue. As you can imagine, an unreliable backup system isn't actually a backup system, so I went hunting for something else. This time through, I decided I wasn't going to worry quite as much about the onsite backup (shame on me) and decided to back straight up into the cloud.

Why skip the local backup? Well, because the previous method, although secure, doesn't lend itself well to restores, since separate systems handle the backups versus the encryption As a result, to be able to restore a file, I would need to know the "when" of the file, then restore the entire backup for that system at that time, then mount that backup to be able to find the file, rather than being able to grab the one file I want. Not being able to see the names of files being restored can be quite painful. Having access to considerably more storage allows for a single system to perform both, while still being secure.

Storage

But how to get considerably more storage? In my case, I started using Microsoft 365, so would it be possible to mount a OneDrive drive in Linux? As it turns out: yes, albeit with caveats. Using rclone, it's possible to mount different cloud storage providers, including OneDrive. Installing it is as simple as you would expect:

$ apt install rclone

To set up the connection, follow the appropriate instructions for your service on the documentation page. Pay attention to any additional notes for your service (for example, for Microsoft OneDrive, the notes regarding versioning and how to disable it).

The difference here is that OneDrive is then mounted, so that the storage is streamed on an as-needed basis and is completely available. rclone doesn't have built-in fusermount support, though, so follow the instructions here to create /usr/local/bin/rclonefs. To mount on-boot, using the systemd approach is more reliable than the fstab approach, since it's possible to have the mount wait on network access.

The mount script assumes that there's something in cloud storage (otherwise loops waiting for something), so you may need to mount it by hand and populate it with something first to have the systemd approach work as expected.

There are a couple of caveats (i.e. things that don't work) about this approach:

  • There is no file ownership - similar to SMB, all files are owned by a single user.
  • There are no file permissions.
  • There are no symlinks or hardlinks.
  • 0-byte files can be deleted, but cannot be edited.

These have an impact on the options for backup software.

Backups

I found that, after searching through several different options, the one that worked best for me is restic. Several don't play nice with rclone mount due to symlinks/hardlinks (BackupPC, UrBackup) or file permissions (restic via sftp), and many rely on the server for encryption, meaning that compromising the server means that all data is compromised (BackupPC, UrBackup). Some of them are designed to fundamentally work against tape drives and not disk drives, leading to other issues (Bacula, Bareos, BURP). Borg Backup and Duplicacy could be options, but hit problems when attempting to secure clients from each other, since setting up per-user sftp chroot jails on top of rclone mount has its own security issues (that of needing to grant the container CAP_SYS_ADMIN, which is... not ideal. This problem does go away if a local backup is also kept, however. Borg Backup is very dependent upon a local cache (meaning that system restores get uglier) and has very limited support for transfer protocols, and Duplicacy has a weird license, but both could potentially work as well, particularly if either a local backup is kept or a transfer protocol other than sftp is used (in the case of Duplicacy).

For handling cloud storage, I've set up access to restic via its Rest Server, so that all files are owned by the user the daemon runs as (which neatly bypasses a lot of the permissions issues). It allows for partitioning users away from each other, but at the cost of needing yet another set of credentials to juggle. Via sftp, restic attempts to set file permissions to 0700, which doesn't work so well if sftp is set up with separate accounts either. The configuration ends up being fairly straightforward:

restic.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: backup-pv
  labels:
    name: backup-pv
spec:
  capacity:
    storage: <storage>
  volumeMode: Filesystem
  accessModes:
 - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: <path>
  nodeAffinity:
    required:
      nodeSelectorTerms:
     - matchExpressions:
       - key: kubernetes.io/hostname
          operator: In
          values:
         - <system>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-pvc
spec:
  accessModes:
   - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: <storage>
  storageClassName: local-storage
  selector:
    matchLabels:
      name: "backup-pv"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: restic
  labels:
    app: restic
spec:
  replicas: 1
  selector:
    matchLabels:
      app: restic
  template:
    metadata:
      labels:
        app: restic
    spec:
      containers:
     - name: restic
        image: restic/rest-server
        env:
       - name: OPTIONS
          value: "--private-repos"
        volumeMounts:
       - name: backup-pvc
          mountPath: /data
        ports:
       - containerPort: 8000
      volumes:
     - name: backup-pvc
        persistentVolumeClaim:
          claimName: backup-pvc
---
kind: Service
apiVersion: v1
metadata:
  name: restic
  labels:
    app: restic
spec:
  selector:
    app: restic
  ports:
 - protocol: TCP
    port: 8000
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: restic
  labels:
    app: restic
spec:
  rules:
 - host: <hostname>
    http:
      paths:
     - path: /
        backend:
          serviceName: restic
          servicePort: 8000

rclone mount doesn't like 0-byte files, so you should prepopulate the .htpasswd file that the Rest Server has some content.

Local

Once the pod is up and running, add a user entry via:

$ kubectl exec -it <pod> -- create_user <user> <password>

After this, setting up backups is straightforward. On the client system:

$ apt install restic
$ restic -r reset:https://<user>:<password>@<hostname>/<user> init

This will initialize the backup bucket, asking for an encryption password. Make sure to record this password! After this, set up an environment file:

/root/backup.env
export RESTIC_REPOSITORY=rest:https://${RESTIC_USER}:<password>@restic.service.internal.toreishi.net/${RESTIC_USER}
export RESTIC_PASSWORD=<encryption password>

Then, create the backup cron job, ensuring that it's executable:

/etc/cron.daily/backup
#!/bin/bash -e

set -o pipefail

<preparatory commands>

RESTIC_USER=`hostname`
source /root/backup.env

restic backup <directory> [<directory> ...]
# Change to your own preferences, per https://restic.readthedocs.io/en/stable/060_forget.html#removing-snapshots-according-to-a-policy
restic forget --keep-daily=7 --keep-weekly 4 --keep-monthly 12
restic check