FreeBSD Backup Using dump and duplicity

I spent some time thinking about backup strategy, and I decided for my purposes, I’d like to handle the staging process (getting all the files put together), and I’d like the backup solution itself to simply upload the files – but since I want to do nightly backups, I’d like the backup solution to have incremental capabilities.

I narrowed it down to two possible solutions – Tarsnap and Duplicity. Both support incremental backups, both are command-line capable. I decided to use Duplicity because it uploads directly to whichever back-end service you use – be it Amazon S3 or an SFTP server . Tarsnap uses S3, but that’s your only option, and they do some processing for you, and because of that, it costs more.

Now, on to the details.

Getting Started

Standard disclaimer: This is not at all supported by anyone and if you choose to try this, you’re doing so at your own risk. This works fine for me, but your mileage may vary. I am not in any way responsible for any costs this may incur to you, or any damage this may cause to you or your system(s).

Do NOT attempt to run any scripts you download from the internet without first fully understanding and testing them. I have only tested this on my system, and I make no guarantees that it will work on your system – you may need to modify it to do so.

I welcome any feedback, of course, and if there’s enough interest, perhaps I can turn this into a project.

Requirements

The magic here is in the scripts that power this whole process. Here are the things I wanted to (configurably) include in the backup process:

Dumps of all the file systems on the box
The output of bsdlabel (so I can put the partitions back together the same way)
/etc/fstab
root’s crontab (which I always keep in /root/crontab)
Custom directories outside of the main filesystems – in my case, certain locations within the /tank ZFS volume.
A way to easily and automatically exclude certain directories from the dumps (like /usr/src)

Background

My Filesystem Layout

My file systems are laid out as follows. The /tank filesystem is a mirrored ZFS. I don’t want to back up everything in /tank.

Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/ad4s1a      496M    272M    185M    60%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/ad4s1e      989M     34M    877M     4%    /tmp
/dev/ad4s1f      101G    2.9G     90G     3%    /usr
/dev/ad4s1d      1.9G    492M    1.3G    27%    /var
tank             143G    355M    143G     0%    /tank
tank/backup      208G     65G    143G    31%    /tank/backup

Dump(8) The File Systems

The dump(8) utility literally dumps individual file systems. This is important – the standard FreeBSD configuration is to have the disk sliced into separate file systems – /, /tmp, /var, and /usr. We’ll need to dump all of them individually.

For our purposes, we’re going to dump into a file. Actually, we’re going to dump to stdout and pipe it to bzip2, and then redirect bzip2’s output to a file. Here’s an example of the command:

dump -0Lauf - /dev/ad4s1f | bzip2 > /tank/backup/darkhelmet/dumps/usr.dump.gz

If you’re an experienced UNIX/BSD/Linux user, you can probably figure out what that does, but I’ll break it down anyway:

dump:

-0: Dump level 0 – perform a full backup. Dump allows you to specify different levels to do incremental backups. The script below supports incremental backups because you pass the dump level on the command line.
-L: Tell dump that it’s dumping a live filesystem – this will cause dump to take a snapshot in time of the filesystem and then back up that snapshot. This is important as the contents could be in a state of flux while the dump is running.
-a: Auto-size the dump file. We’re not writing to a tape here.
-u: Update the contents of /etc/dumpdates. This file keeps track of the last time each file system was dumped, so it knows what to include in the incrementals..
-f: Write the backup to a file. In our case, we’ve specified “-” which means write the backup data to stdout.
/dev/ad4s1f: The file system we’re backing up. On my system, this is /usr.

We then pipe (|) that output into the bzip2 utility, which would write the compressed data to stdout. Since we want that all in a file, we then redirect (>) the bzip2 output to a file, which will get uploaded to S3 by the backup script and duplicity.

Install the Software

Dump is already on your system. Duplicity is not, so you need to install it via the ports collection:

cd /usr/ports/sysutils/duplicity
make install clean

FreeBSD should, of course, get all the dependencies for you.

You also need the bash shell installed.

Configuration

The script consists of three files: backup.sh, backup_vars.sh, and .security_vars.sh. backup.sh and backup_vars.sh are below.

Basically, you configure variables in backup_vars.sh and .security_vars.sh. I’ve included a sample backup_vars.sh below. The main backup script tells you how to create .security_vars.sh if it doesn’t see one.

Testing

Once you’ve set up the scripts, you should test it. Just run:

./backup.sh 0

This should create a full dump of all the filesystems you selected and upload them to S3 (unless you specified NOUPLOAD on the command line).

Then, run:

./backup.sh 1

This will create an incremental backup – just the things that have changed since the full backup (which shouldn’t be much), and it will upload those.

Here’s a small script that will show you the collection status (see the duplicity man page for more info on this):

#!/usr/local/bin/bash
. ./.security_vars.sh
. ./backup_vars.sh

export AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY

TMPCMD="duplicity collection-status ${AWS_DESTINATION}"

echo "Running: ${TMPCMD}"

${TMPCMD}

Scheduling

I have it in the crontab to run in the early hours of the morning, every day. On Sunday mornings, I run a level 0 (full backup), and every other morning I run a level 1.

Restoring Your System

To restore your system, you’ll need to download the files using duplicity. See the duplicity man page on how to do that. Once you retrieve the files, you can re-do your partitioning and use restore(8) to restore the dumps, and then put back any custom directories.

Code

These scripts are in their early stages. They’re a bit messy. Also, the syntax highlighting plugin I’m using seems to slightly mess up some indentation, but not to the point where it’s unreadable. Finally – I’ll say it one more time – I can’t guarantee that this will work, so at the very least, it should be a starting point for you to design your own backup solution.

Here’s backup_vars.sh. Configure everything here.

#!/usr/bin/env bash

# Version: 0.1

# FSLIST: The list of file systems that will be dumped along with the name of the dump.
# Example: /dev/ad4s1a=root will dump the /dev/ad4s1a volume and name it root.dump.levelX.bz2
FSLIST="/dev/ad4s1a=root /dev/ad4s1f=usr /dev/ad4s1d=var"

# BSDLABEL_PARTITIONS: The list of partitions to run BSDLABEL on.
# This will be saved in the staging directory as bsdlabel.txt.
BSDLABEL_PARTITIONS="ad4s1"
# DUMPDIR: The directory that the dumps will be written to.
DUMPDIR="/tank/backup/dumps"

# BACKUP_STAGING_DIR: The directory that files will be copied to prior to uploading.
# The remote directory will be a replica of the staging directory.
BACKUP_STAGING_DIR="/tank/backup/staging"

# CUSTOM_DIRS: Add custom directories to the staging directory.
# Separate the source and destination with an equals sign.
CUSTOM_DIRS=`/tank/backup/web1/=${BACKUP_STAGING_DIR}/web1/ /tank/backup/web2/=${BACKUP_STAGING_DIR}/web2/`

# NOUPLOAD: Should be set to "0" by default.
NOUPLOAD=0

# NODUMP_DIRS: List of directories to set the nodump flag.
NODUMP_DIRS="/usr/ports /usr/obj /usr/src"

# TMPDIR: The temporary directory that duplicity should use
TMPDIR="/tank/backup/darkhelmet/tmp"

# FULL_BACKUPS_TO_KEEP: The number of full backups (And associated incrementals) to keep.
FULL_BACKUPS_TO_KEEP=5

# Amazon destination
AWS_DESTINATION="s3+http://mybackup/backup"

###########################################

# Variables for duplicity must be exported!
export TMPDIR

Here’s the actual backup script:

#!/usr/bin/env bash

# Version: 0.1

# Get the directory we're running from.
SCRIPTDIR=`dirname $0`

cd ${SCRIPTDIR}
if [ $? -ne 0 ]; then
        echo "ERROR: Unable to cd to ${SCRIPTDIR}!"
fi

if [ ! -f "./backup_vars.sh" ]; then
        echo "ERROR: backup_vars.sh does not exist."
        exit 1
else
    . ./backup_vars.sh
fi

if [ ! -f "./.security_vars.sh" ]; then
        echo "ERROR: .security_vars.sh does not exist."
        echo "Please create it with the following format:"
        echo ""
        echo "#!/usr/bin/env bash"
        echo "AWS_ACCESS_KEY_ID="<YOUR AWS ACCESS KEY ID>""
        echo "AWS_SECRET_ACCESS_KEY="<YOUR AWS SECRET ACCESS KEY>""
        echo ""
        echo "Note: This file should be secure, probably mode 0700."
        exit 1
else
    . ./.security_vars.sh
        export AWS_ACCESS_KEY_ID
        export AWS_SECRET_ACCESS_KEY
fi

# If we were executed like "./whatever.sh" - set SCRIPTDIR to the pwd.
if [ "${SCRIPTDIR}" == "." ]; then
    SCRIPTDIR=`pwd`
fi

echo "Script is running from ${SCRIPTDIR}"

# Check the command line.
if [ "${1}" == "" ]; then
        echo "Must specify dump level."
        exit
else
        DUMPLVL=${1}
fi

if [ "${2}" == "NOUPLOAD" ]; then
        echo "NOUPLOAD was specified, will run all backups but will not upload."
        echo ""
        NOUPLOAD=1
fi

# Sanity check.
if [ "${DUMPLVL}" == "" ]; then
        echo "ERROR: For some reason DUMPLVL never got set!"
        echo ""
        exit 1
fi

# Create the flag file so we can't run more than one instance.
if [ -f "${SCRIPTDIR}/running.flg" ]; then
    echo "Script is already running - ${SCRIPTDIR}/running.flg exists!"
        echo ""
        exit
else
        echo ""
    echo "Creating ${SCRIPTDIR}/running.flg"
        touch ${SCRIPTDIR}/running.flg
fi

for DIR in ${NODUMP_DIRS}; do
    echo ""
    echo "Setting nodump on ${DIR}"
    chflags -R nodump ${DIR}/
done

echo ""
echo "Dump Level: ${DUMPLVL}"

for FSITEM in ${FSLIST}; do
        FS=`echo ${FSITEM} | awk -F= '{ print $1 }'`
        NAME=`echo ${FSITEM} | awk -F= '{ print $2 }'`
        echo ""
        echo "FS: ${FS}"
        echo "NAME: ${NAME}"
        echo "LEVEL: ${DUMPLVL}"
        echo "dump -${DUMPLVL}Lauf - ${FS} | bzip2 -9 > ${DUMPDIR}/${NAME}.dump.level${DUMPLVL}.bz2"
        dump -${DUMPLVL}Lauf - ${FS} | bzip2 -9 > ${DUMPDIR}/${NAME}.dump.level${DUMPLVL}.bz2
done

# Set up the backup staging directory.

if [ ! -d ${BACKUP_STAGING_DIR} ]; then
    echo "Backup directory ${BACKUP_STAGING_DIR} does not exist, creating."
    mkdir -p ${BACKUP_STAGING_DIR}
        if [ $? -ne 0 ]; then
            echo "Error creating backup directory!"
                exit
        fi
fi

echo ""
echo "Copying fstab to ${BACKUP_STAGING_DIR}"
cp /etc/fstab ${BACKUP_STAGING_DIR}/fstab

for PARTITION in ${BSDLABEL_PARTITIONS}; do
    echo ""
        echo "Running bsdlabel for ${PARTITION} -> ${BACKUP_STAGING_DIR}/bsdlabel_${PARTITION}.txt"
    bsdlabel ${PARTITION} > ${BACKUP_STAGING_DIR}/bsdlabel_${PARTITION}.txt
done

echo ""
echo "Copying $0 to ${BACKUP_STAGING_DIR}"
cp $0 ${BACKUP_STAGING_DIR}/`basename $0`

for CUSTDIRITEM in ${CUSTOM_DIRS}; do
    SRCDIR=`echo ${CUSTDIRITEM} | awk -F= '{ print $1 }'`
        DESTDIR=`echo ${CUSTDIRITEM} | awk -F= '{ print $2}'`
        echo ""
        echo "Source Directory:      ${SRCDIR}"
        echo "Destination Directory: ${DESTDIR}"
        if [ ! -d "${DESTDIR}" ]; then
            mkdir -p ${DESTDIR}
        fi
        cp -R ${SRCDIR}/* ${DESTDIR}
done

echo ""
echo "Copying /root/crontab to ${BACKUP_STAGING_DIR}"
cp -f /root/crontab ${BACKUP_STAGING_DIR}/root_crontab

echo ""
echo "Copying dumps from ${DUMPDIR} to ${BACKUP_STAGING_DIR}"
cp -v ${DUMPDIR}/*.bz2 ${BACKUP_STAGING_DIR}/

# Do a full backup on dump level 0.
if [ "${DUMPLVL}" == "0" ]; then
   BACKUP_TYPE="full"
else
   BACKUP_TYPE=""
fi

# Upload it!
if [ ! "${2}" == "NOUPLOAD" ]; then
    echo ""
    echo "Running duplicity"
        TMP_CMD="duplicity ${BACKUP_TYPE} --no-encryption ${BACKUP_STAGING_DIR} ${AWS_DESTINATION}"
        echo ${TMP_CMD}
        ${TMP_CMD}

        TMP_CMD="duplicity remove-all-but-n-full ${FULL_BACKUPS_TO_KEEP} --force ${AWS_DESTINATION}"
        echo ${TMP_CMD}
        ${TMP_CMD}

        TMP_CMD="duplicity collection-status ${AWS_DESTINATION}"
        echo ${TMP_CMD}
        ${TMP_CMD}
else
    echo ""
        echo "NOUPLOAD was specified, duplicity skipped."
fi

if [ "${NOUPLOAD}" != "1" ]; then
    echo ""
    echo "Clearing backup staging directory."
    rm -rf ${BACKUP_STAGING_DIR}/
fi

echo ""
echo "Removing ${SCRIPTDIR}/running.flg"
rm -f ${SCRIPTDIR}/running.flg

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=