Backing Up Paperless-ngx with Rclone


If I were allowed to self-host only one software, it would undoubtedly be Paperless-ngx[1]. I seldom consider software to be life-changing, but this is one of those rare occasions. Paperless-ngx has allowed me to establish a workflow for scanning and organizing documents. I scan a document with my ScanSnap iX1600 to a folder on my desktop that is monitored by Syncthing[2]. This Syncthing directory syncs to a directory on my Paperless-ngx server, known as the "consume" directory. When Paperless-ngx detects a document within the consume directory, it absorbs it, performs OCR, and automatically knows how to tag it due to its machine learning capabilities. This allows me to organize and retain most documents that I acquire while also giving me the freedom to only keep physical copies of those that are vitally important.

That being said, it would be unwise to run such an important service without performing regular backups. I chose to use Rclone[3] for backups because it offers a wide variety of remote storage options. Most importantly for this scenario, Rclone is also capable of encrypting files before copying them.

It is worth noting that this post is just a quick overview of how I decided to implement backups for my Paperless-ngx instance rather than a hands-on tutorial.

Below is the shell script that I use:
#!/bin/bash

HOST_EXPORT_PATH="/home/tlivolsi/paperless-ngx/export"
DOCKER_COMPOSE_PATH="/home/tlivolsi/paperless-ngx"
RCLONE_CONFIG="/home/tlivolsi/.config/rclone/rclone.conf"
DOCKER_EXPORT_PATH="../export"
REMOTE="crypt:/"
BACKUP_PREFIX="paperless_backup"
MAX_BACKUPS=6

# Function to log errors and exit
log_and_exit() {
    logger -t paperless_backup "$1"
    exit 1
}

# Generate backup file name
current_date=$(date +"%Y-%m-%d")
backup_file_name="${BACKUP_PREFIX}_${current_date}"

# Run the document exporter
cd "$DOCKER_COMPOSE_PATH" || log_and_exit "Failed to navigate to docker-compose directory"
docker compose exec -T webserver document_exporter "$DOCKER_EXPORT_PATH" -z -zn "$backup_file_name" \
    || log_and_exit "Docker compose command failed"

# Ensure backup file creation
[ -f "$HOST_EXPORT_PATH/$backup_file_name.zip" ] || log_and_exit "Backup file not found in host export path; aborting rclone copy."

# Manage old backups
backup_count=$(rclone ls "$REMOTE" --config "$RCLONE_CONFIG" | grep -c "${BACKUP_PREFIX}_.*\.zip$")
if [ "$backup_count" -ge "$MAX_BACKUPS" ]; then
    oldest_backup=$(rclone lsl "$REMOTE" --config "$RCLONE_CONFIG" | awk '/'"$BACKUP_PREFIX"'/ {print $2" "$3" "$NF}' | sort | head -n 1 | awk '{print $3}')
    if [ -n "$oldest_backup" ]; then
        rclone delete "$REMOTE$oldest_backup" --config "$RCLONE_CONFIG" \
            && logger -t paperless_backup "Removed oldest backup file: $oldest_backup"
    fi
fi

# Copy the backup file using rclone
rclone copy "$HOST_EXPORT_PATH/$backup_file_name.zip" "$REMOTE" --config "$RCLONE_CONFIG" -vv \
    || log_and_exit "rclone command failed"

# Verify backup file existence on remote server
rclone ls "$REMOTE" --config "$RCLONE_CONFIG" | grep -q "$backup_file_name.zip" \
    || log_and_exit "Backup file not found on remote server; aborting deletion of local backup."

# Remove local backup file
rm "$HOST_EXPORT_PATH/$backup_file_name.zip"

Using systemd-timer to Schedule the Backup Intervals


$ mkdir -p ~/.config/systemd/user && touch ~/.config/systemd/user/paperless_backup.{service,timer}

Here are the contents of the systemd paperless_backup.service service file:

#### paperless_backup.service ####

[Unit]
Description=Daily Paperless Backup

[Service]
ExecStart=/bin/bash /home/tlivolsi/paperless_backup.sh

[Install]
WantedBy=default.target

And here are the contents of the corresponding paperless_backup.timer file:

#### paperless_backup.timer ####

[Unit]
Description=Timer for Paperless Backup Service

[Timer]
# The timer causes the paperless_backup.service to run every day at midnight.
OnCalendar=*-*-* 00:00:00
Persistent=true

[Install]
WantedBy=timers.target

Subsequently reload systemd so it's aware of these new units, enable and start the timer, and then verify that it is running.

$ systemctl --user daemon-reload && systemctl --user enable --now paperless_backup.timer && systemctl --user status paperless_backup.timer
● paperless_backup.timer - Timer for Paperless Backup Service
     Loaded: loaded (/home/tlivolsi/.config/systemd/user/paperless_backup.timer; enabled; vendor preset: enabled)
     Active: active (waiting) since Wed 2023-12-13 17:52:15 UTC; 45min ago
    Trigger: Thu 2023-12-14 00:00:00 UTC; 5h 22min left
   Triggers: ● paperless_backup.service

Dec 13 17:52:15 paperless systemd[2162085]: Started Timer for Paperless Backup Service.
References

^ [1]Paperless-ngx
^ [2]Syncthing
^ [3]Rclone