I spent some time thinking about backup strategy, and I decided for my purposes, I’d like to handle the staging process (getting all the files put together), and I’d like the backup solution itself to simply upload the files – but since I want to do nightly backups, I’d like the backup solution to have incremental capabilities.
I narrowed it down to two possible solutions – Tarsnap and Duplicity. Both support incremental backups, both are command-line capable. I decided to use Duplicity because it uploads directly to whichever back-end service you use – be it Amazon S3 or an SFTP server . Tarsnap uses S3, but that’s your only option, and they do some processing for you, and because of that, it costs more.
Now, on to the details.
Getting Started
Standard disclaimer: This is not at all supported by anyone and if you choose to try this, you’re doing so at your own risk. This works fine for me, but your mileage may vary. I am not in any way responsible for any costs this may incur to you, or any damage this may cause to you or your system(s).
Do NOT attempt to run any scripts you download from the internet without first fully understanding and testing them. I have only tested this on my system, and I make no guarantees that it will work on your system – you may need to modify it to do so.
I welcome any feedback, of course, and if there’s enough interest, perhaps I can turn this into a project.
Requirements
The magic here is in the scripts that power this whole process. Here are the things I wanted to (configurably) include in the backup process:
- Dumps of all the file systems on the box
- The output of bsdlabel (so I can put the partitions back together the same way)
- /etc/fstab
- root’s crontab (which I always keep in /root/crontab)
- Custom directories outside of the main filesystems – in my case, certain locations within the /tank ZFS volume.
- A way to easily and automatically exclude certain directories from the dumps (like /usr/src)
Background
My Filesystem Layout
My file systems are laid out as follows. The /tank filesystem is a mirrored ZFS. I don’t want to back up everything in /tank.
Filesystem Size Used Avail Capacity Mounted on /dev/ad4s1a 496M 272M 185M 60% / devfs 1.0K 1.0K 0B 100% /dev /dev/ad4s1e 989M 34M 877M 4% /tmp /dev/ad4s1f 101G 2.9G 90G 3% /usr /dev/ad4s1d 1.9G 492M 1.3G 27% /var tank 143G 355M 143G 0% /tank tank/backup 208G 65G 143G 31% /tank/backup
Dump(8) The File Systems
The dump(8) utility literally dumps individual file systems. This is important – the standard FreeBSD configuration is to have the disk sliced into separate file systems – /, /tmp, /var, and /usr. We’ll need to dump all of them individually.
For our purposes, we’re going to dump into a file. Actually, we’re going to dump to stdout and pipe it to bzip2, and then redirect bzip2’s output to a file. Here’s an example of the command:
dump -0Lauf - /dev/ad4s1f | bzip2 > /tank/backup/darkhelmet/dumps/usr.dump.gz
If you’re an experienced UNIX/BSD/Linux user, you can probably figure out what that does, but I’ll break it down anyway:
dump:
- -0: Dump level 0 – perform a full backup. Dump allows you to specify different levels to do incremental backups. The script below supports incremental backups because you pass the dump level on the command line.
- -L: Tell dump that it’s dumping a live filesystem – this will cause dump to take a snapshot in time of the filesystem and then back up that snapshot. This is important as the contents could be in a state of flux while the dump is running.
- -a: Auto-size the dump file. We’re not writing to a tape here.
- -u: Update the contents of /etc/dumpdates. This file keeps track of the last time each file system was dumped, so it knows what to include in the incrementals..
- -f: Write the backup to a file. In our case, we’ve specified “-” which means write the backup data to stdout.
- /dev/ad4s1f: The file system we’re backing up. On my system, this is /usr.
We then pipe (|) that output into the bzip2 utility, which would write the compressed data to stdout. Since we want that all in a file, we then redirect (>) the bzip2 output to a file, which will get uploaded to S3 by the backup script and duplicity.
Install the Software
Dump is already on your system. Duplicity is not, so you need to install it via the ports collection:
cd /usr/ports/sysutils/duplicity make install clean
FreeBSD should, of course, get all the dependencies for you.
You also need the bash shell installed.
Configuration
The script consists of three files: backup.sh, backup_vars.sh, and .security_vars.sh. backup.sh and backup_vars.sh are below.
Basically, you configure variables in backup_vars.sh and .security_vars.sh. I’ve included a sample backup_vars.sh below. The main backup script tells you how to create .security_vars.sh if it doesn’t see one.
Testing
Once you’ve set up the scripts, you should test it. Just run:
./backup.sh 0
This should create a full dump of all the filesystems you selected and upload them to S3 (unless you specified NOUPLOAD on the command line).
Then, run:
./backup.sh 1
This will create an incremental backup – just the things that have changed since the full backup (which shouldn’t be much), and it will upload those.
Here’s a small script that will show you the collection status (see the duplicity man page for more info on this):
#!/usr/local/bin/bash . ./.security_vars.sh . ./backup_vars.sh export AWS_ACCESS_KEY_ID export AWS_SECRET_ACCESS_KEY TMPCMD="duplicity collection-status ${AWS_DESTINATION}" echo "Running: ${TMPCMD}" ${TMPCMD}
Scheduling
I have it in the crontab to run in the early hours of the morning, every day. On Sunday mornings, I run a level 0 (full backup), and every other morning I run a level 1.
Restoring Your System
To restore your system, you’ll need to download the files using duplicity. See the duplicity man page on how to do that. Once you retrieve the files, you can re-do your partitioning and use restore(8) to restore the dumps, and then put back any custom directories.
Code
These scripts are in their early stages. They’re a bit messy. Also, the syntax highlighting plugin I’m using seems to slightly mess up some indentation, but not to the point where it’s unreadable. Finally – I’ll say it one more time – I can’t guarantee that this will work, so at the very least, it should be a starting point for you to design your own backup solution.
Here’s backup_vars.sh. Configure everything here.
#!/usr/bin/env bash # Version: 0.1 # FSLIST: The list of file systems that will be dumped along with the name of the dump. # Example: /dev/ad4s1a=root will dump the /dev/ad4s1a volume and name it root.dump.levelX.bz2 FSLIST="/dev/ad4s1a=root /dev/ad4s1f=usr /dev/ad4s1d=var" # BSDLABEL_PARTITIONS: The list of partitions to run BSDLABEL on. # This will be saved in the staging directory as bsdlabel.txt. BSDLABEL_PARTITIONS="ad4s1" # DUMPDIR: The directory that the dumps will be written to. DUMPDIR="/tank/backup/dumps" # BACKUP_STAGING_DIR: The directory that files will be copied to prior to uploading. # The remote directory will be a replica of the staging directory. BACKUP_STAGING_DIR="/tank/backup/staging" # CUSTOM_DIRS: Add custom directories to the staging directory. # Separate the source and destination with an equals sign. CUSTOM_DIRS=`/tank/backup/web1/=${BACKUP_STAGING_DIR}/web1/ /tank/backup/web2/=${BACKUP_STAGING_DIR}/web2/` # NOUPLOAD: Should be set to "0" by default. NOUPLOAD=0 # NODUMP_DIRS: List of directories to set the nodump flag. NODUMP_DIRS="/usr/ports /usr/obj /usr/src" # TMPDIR: The temporary directory that duplicity should use TMPDIR="/tank/backup/darkhelmet/tmp" # FULL_BACKUPS_TO_KEEP: The number of full backups (And associated incrementals) to keep. FULL_BACKUPS_TO_KEEP=5 # Amazon destination AWS_DESTINATION="s3+http://mybackup/backup" ########################################### # Variables for duplicity must be exported! export TMPDIR
Here’s the actual backup script:
#!/usr/bin/env bash # Version: 0.1 # Get the directory we're running from. SCRIPTDIR=`dirname $0` cd ${SCRIPTDIR} if [ $? -ne 0 ]; then echo "ERROR: Unable to cd to ${SCRIPTDIR}!" fi if [ ! -f "./backup_vars.sh" ]; then echo "ERROR: backup_vars.sh does not exist." exit 1 else . ./backup_vars.sh fi if [ ! -f "./.security_vars.sh" ]; then echo "ERROR: .security_vars.sh does not exist." echo "Please create it with the following format:" echo "" echo "#!/usr/bin/env bash" echo "AWS_ACCESS_KEY_ID="<YOUR AWS ACCESS KEY ID>"" echo "AWS_SECRET_ACCESS_KEY="<YOUR AWS SECRET ACCESS KEY>"" echo "" echo "Note: This file should be secure, probably mode 0700." exit 1 else . ./.security_vars.sh export AWS_ACCESS_KEY_ID export AWS_SECRET_ACCESS_KEY fi # If we were executed like "./whatever.sh" - set SCRIPTDIR to the pwd. if [ "${SCRIPTDIR}" == "." ]; then SCRIPTDIR=`pwd` fi echo "Script is running from ${SCRIPTDIR}" # Check the command line. if [ "${1}" == "" ]; then echo "Must specify dump level." exit else DUMPLVL=${1} fi if [ "${2}" == "NOUPLOAD" ]; then echo "NOUPLOAD was specified, will run all backups but will not upload." echo "" NOUPLOAD=1 fi # Sanity check. if [ "${DUMPLVL}" == "" ]; then echo "ERROR: For some reason DUMPLVL never got set!" echo "" exit 1 fi # Create the flag file so we can't run more than one instance. if [ -f "${SCRIPTDIR}/running.flg" ]; then echo "Script is already running - ${SCRIPTDIR}/running.flg exists!" echo "" exit else echo "" echo "Creating ${SCRIPTDIR}/running.flg" touch ${SCRIPTDIR}/running.flg fi for DIR in ${NODUMP_DIRS}; do echo "" echo "Setting nodump on ${DIR}" chflags -R nodump ${DIR}/ done echo "" echo "Dump Level: ${DUMPLVL}" for FSITEM in ${FSLIST}; do FS=`echo ${FSITEM} | awk -F= '{ print $1 }'` NAME=`echo ${FSITEM} | awk -F= '{ print $2 }'` echo "" echo "FS: ${FS}" echo "NAME: ${NAME}" echo "LEVEL: ${DUMPLVL}" echo "dump -${DUMPLVL}Lauf - ${FS} | bzip2 -9 > ${DUMPDIR}/${NAME}.dump.level${DUMPLVL}.bz2" dump -${DUMPLVL}Lauf - ${FS} | bzip2 -9 > ${DUMPDIR}/${NAME}.dump.level${DUMPLVL}.bz2 done # Set up the backup staging directory. if [ ! -d ${BACKUP_STAGING_DIR} ]; then echo "Backup directory ${BACKUP_STAGING_DIR} does not exist, creating." mkdir -p ${BACKUP_STAGING_DIR} if [ $? -ne 0 ]; then echo "Error creating backup directory!" exit fi fi echo "" echo "Copying fstab to ${BACKUP_STAGING_DIR}" cp /etc/fstab ${BACKUP_STAGING_DIR}/fstab for PARTITION in ${BSDLABEL_PARTITIONS}; do echo "" echo "Running bsdlabel for ${PARTITION} -> ${BACKUP_STAGING_DIR}/bsdlabel_${PARTITION}.txt" bsdlabel ${PARTITION} > ${BACKUP_STAGING_DIR}/bsdlabel_${PARTITION}.txt done echo "" echo "Copying $0 to ${BACKUP_STAGING_DIR}" cp $0 ${BACKUP_STAGING_DIR}/`basename $0` for CUSTDIRITEM in ${CUSTOM_DIRS}; do SRCDIR=`echo ${CUSTDIRITEM} | awk -F= '{ print $1 }'` DESTDIR=`echo ${CUSTDIRITEM} | awk -F= '{ print $2}'` echo "" echo "Source Directory: ${SRCDIR}" echo "Destination Directory: ${DESTDIR}" if [ ! -d "${DESTDIR}" ]; then mkdir -p ${DESTDIR} fi cp -R ${SRCDIR}/* ${DESTDIR} done echo "" echo "Copying /root/crontab to ${BACKUP_STAGING_DIR}" cp -f /root/crontab ${BACKUP_STAGING_DIR}/root_crontab echo "" echo "Copying dumps from ${DUMPDIR} to ${BACKUP_STAGING_DIR}" cp -v ${DUMPDIR}/*.bz2 ${BACKUP_STAGING_DIR}/ # Do a full backup on dump level 0. if [ "${DUMPLVL}" == "0" ]; then BACKUP_TYPE="full" else BACKUP_TYPE="" fi # Upload it! if [ ! "${2}" == "NOUPLOAD" ]; then echo "" echo "Running duplicity" TMP_CMD="duplicity ${BACKUP_TYPE} --no-encryption ${BACKUP_STAGING_DIR} ${AWS_DESTINATION}" echo ${TMP_CMD} ${TMP_CMD} TMP_CMD="duplicity remove-all-but-n-full ${FULL_BACKUPS_TO_KEEP} --force ${AWS_DESTINATION}" echo ${TMP_CMD} ${TMP_CMD} TMP_CMD="duplicity collection-status ${AWS_DESTINATION}" echo ${TMP_CMD} ${TMP_CMD} else echo "" echo "NOUPLOAD was specified, duplicity skipped." fi if [ "${NOUPLOAD}" != "1" ]; then echo "" echo "Clearing backup staging directory." rm -rf ${BACKUP_STAGING_DIR}/ fi echo "" echo "Removing ${SCRIPTDIR}/running.flg" rm -f ${SCRIPTDIR}/running.flg export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY=