FreeBSD Backup Using JungleDisk and Dump

I previously discussed configuring JungleDisk on FreeBSD.  It’s not quite the easiest to install since FreeBSD isn’t officially supported.  To take that a step further, I’m now going to show what I do to back up my FreeBSD box at home.

Update, November 2009: I am no longer using JungleDisk to back up my FreeBSD box.  Jungledisk recently released version 3.0 of their software which does not include a command-line Linux version in the standard desktop edition.  I was advised to stick with the old version if I want to continue backing up.  Instead, I chose to change over to Duplicity.  I will write a post on Duplicity in the near future.

There are a couple of steps to this process.  First, we must perform the backup itself.  I’m using dump(8) for this purpose – this program is built right into FreeBSD – it’s purpose in the original UNIX was to dump a file system to a tape drive, but we’re going to use it to dump the filesystem to a file.  The second step is to have JungleDisk back the files up to S3.

Standard disclaimer:  This is not at all supported by JungleDisk and if you choose to try this, you’re doing so at your own risk.  This works fine for me, but your mileage may vary.  I am not in any way responsible for any costs this may incur to you, or any damage this may cause.

Filesystem Layout

Let’s talk about my FreeBSD box.  It’s primary purpose is for network-attached storage, and for that purpose I have a ZFS filesystem mounted on /tank.  Other than that, it’s pretty standard.  Here’s my “df -h” output, for reference:

[root@darkhelmet ~]# df -h
Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/ad4s1a      496M    423M     33M    93%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/ad4s1e      989M    2.4M    908M     0%    /tmp
/dev/ad4s1f      101G    3.1G     90G     3%    /usr
/dev/ad4s1d      1.9G    322M    1.5G    18%    /var

Dump(8) The File Systems

The dump(8) utility literally dumps individual file systems.  This is important – the standard FreeBSD configuration is to have the disk sliced into separate file systems – /, /tmp, /var, and /usr.  We’ll need to dump all of them individually.

For our purposes, we’re going to dump into a file.  Actually, we’re going to dump to stdout and pipe it to gzip, and then redirect gzip’s output to a file.  Here’s the command:

dump -0Lauf - /dev/ad4s1f | gzip > /tank/backup/darkhelmet/dumps/usr.dump.gz

If you’re an experienced UNIX/BSD/Linux user, you can probably figure out what that does, but I’ll break it down in case you’re more of a novice:

dump:

  • -0: Dump level 0 – perform a full backup.  Dump allows you to specify different levels to do incremental backups.  I’m not going to do incremental backups at this time, so we’ll always leave the dump level at 0.
  • -L: Tell dump that it’s dumping a live filesystem – this will cause dump to take a snapshot in time of the filesystem and then back up that snapshot.  This is important as the contents could be in a state of flux while the dump is running.
  • -a: Auto-size the dump file.  We’re not writing to a tape here.
  • -u: Update the contents of /etc/dumpdates.  This file keeps track of the last time each file system was dumped, in case you want to start doing incremental backups.
  • -f: Write the backup to a file.  In our case, we’ve specified “-” which means write the backup data to stdout.
  • /dev/ad4s1f: The file system we’re backing up.  On my system, this is /usr.

We then pipe (|) that output into the gzip utility, which would write the compressed data to stdout.  Since we want that all in a file, we then redirect (>) the gzip output to a file.

When that command is run, I end up with a file called usr.dump.gz in /tank/backup/darkhelmet/dumps, and that will be the file I will have JungleDisk back up to S3.

Once again, it’s important to note that you must dump each filesystem individually.

Script it Out

I wrote a simple shell script to loop through all of my filesystems and dump each of them.  I probably could have automated it more by parsing the output of a mount command, but I didn’t want to get too complicated with it.

#!/bin/sh
FSLIST="/dev/ad4s1a=root /dev/ad4s1f=usr /dev/ad4s1d=var"
DUMPDIR="/tank/backup/darkhelmet/dumps"
for FSITEM in ${FSLIST}; do
        FS=`echo ${FSITEM} | awk -F= '{ print $1 }'`
        NAME=`echo ${FSITEM} | awk -F= '{ print $2 }'`
        echo "FS: ${FS}"
        echo "NAME: ${NAME}"
        echo "dump -0Lauf - ${FS} | gzip > ${DUMPDIR}/${NAME}.dump.gz"
        dump -0Lauf - ${FS} | gzip > ${DUMPDIR}/${NAME}.dump.gz
done

I wrote this script to be easily configurable.  The FSLIST variable contains a space-separated list of the filesystems I want to back up and their names, in a “key=value” type of list.   Then specify the DUMPDIR variable to tell the script where to put your dump files.

We then loop through the ${FSLIST} variable, and use awk to separate the values and get the file system into the ${FS} variable and the name into the ${NAME} variable.  Finally, we use those two variables, along with the ${DUMPDIR} variable, to construct our command lines.

On my system, this script basically runs the following 3 commands:

dump -0Lauf - /dev/ad4s1a | gzip > /tank/backup/darkhelmet/dumps/root.dump.gz
dump -0Lauf - /dev/ad4s1f | gzip > /tank/backup/darkhelmet/dumps/usr.dump.gz
dump -0Lauf - /dev/ad4s1d | gzip > /tank/backup/darkhelmet/dumps/var.dump.gz

Here’s the full output of this script:

[root@darkhelmet /tank/backup/darkhelmet]# ./dh_backup.sh
FS: /dev/ad4s1a
NAME: root
dump -0Lauf - /dev/ad4s1a | gzip > /tank/backup/darkhelmet/dumps/root.dump.gz
  DUMP: Date of this level 0 dump: Sun Mar  1 16:54:35 2009
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping snapshot of /dev/ad4s1a (/) to standard output
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 427200 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: DUMP: 427199 tape blocks
  DUMP: finished in 67 seconds, throughput 6376 KBytes/sec
  DUMP: level 0 dump on Sun Mar  1 16:54:35 2009
  DUMP: DUMP IS DONE
FS: /dev/ad4s1f
NAME: usr
dump -0Lauf - /dev/ad4s1f | gzip > /tank/backup/darkhelmet/dumps/usr.dump.gz
  DUMP: Date of this level 0 dump: Sun Mar  1 16:57:06 2009
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping snapshot of /dev/ad4s1f (/usr) to standard output
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 3147647 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 34.90% done, finished in 0:09 at Sun Mar  1 17:11:27 2009
  DUMP: 77.39% done, finished in 0:02 at Sun Mar  1 17:10:03 2009
  DUMP: DUMP: 3148664 tape blocks
  DUMP: finished in 797 seconds, throughput 3950 KBytes/sec
  DUMP: level 0 dump on Sun Mar  1 16:57:06 2009
  DUMP: DUMP IS DONE
FS: /dev/ad4s1d
NAME: var
dump -0Lauf - /dev/ad4s1d | gzip > /tank/backup/darkhelmet/dumps/var.dump.gz
  DUMP: Date of this level 0 dump: Sun Mar  1 17:11:16 2009
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping snapshot of /dev/ad4s1d (/var) to standard output
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 338748 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: DUMP: 338680 tape blocks
  DUMP: finished in 174 seconds, throughput 1946 KBytes/sec
  DUMP: level 0 dump on Sun Mar  1 17:11:16 2009
  DUMP: DUMP IS DONE
[root@darkhelmet /tank/backup/darkhelmet]#

As you can see, we get some pretty detailed output from our dump commands. Here are the files I have in my dump directory:

[root@darkhelmet /tank/backup/darkhelmet/dumps]# ls -alh
total 1532535
drwxr-xr-x  2 root  dave     5B Mar  1 16:03 .
drwxr-xr-x  4 dave  dave     5B Mar  1 16:54 ..
-rw-r--r--  1 root  dave   127M Mar  1 16:55 root.dump.gz
-rw-r--r--  1 root  dave   1.1G Mar  1 17:10 usr.dump.gz
-rw-r--r--  1 root  dave   230M Mar  1 17:14 var.dump.gz
[root@darkhelmet /tank/backup/darkhelmet/dumps]#

I now have my entire FreeBSD system (except /tmp and /dev) completely backed up into three nice (relatively small) files.

Upload to S3

The next step is to upload them using JungleDisk. Remember, when running under FreeBSD’s Linux binary interface, JungleDisk will see directories relative to /usr/compat/linux. So in my jungledisk-settings.xml, I have configured my backup directory to be /backups.  In the real world, that’s /usr/compat/linux/backups.  However, for some reason, it uses the real filesystem layout when you specify the configuration file on the command line.  I haven’t figured this one out yet, but hey, it works!

I’ve added to my shell script to copy the dump files over to the /usr/compat/linux/backups directory.  You might instead consider having the dumps go directly into that directory.

Set set up your copy of jungledisk using the tutorial in my JungleDisk on FreeBSD post, and copy the jungledisk-settings.xml to the BSD box.

Add the JungleDisk command to your shell script:

cd /path/to/jungledisk/binary
./jungledisk -o config=/tank/backup/jungledisk/jungledisk-settings.xml --startbackups -f --exit -d

Of course, feel free to tweak the command-line options.  Here’s what this command line does:

  • -o config=/tank/backup/jungledisk/jungledisk-settings.xml: Points JungleDisk to it’s XML configuration file.
  • –startbackups: Start all backups in the configuration file immediately.
  • -f: Stay in the foreground – I haven’t played with daemonizing JungleDisk, and I’m going to run this shell script from the cron tab, so no need to fork into the background.
  • –exit: Exit when idle.  This will cause JungleDisk to exit out a few moments after the backup completes.
  • -d: Enable debugging.  Most of this information is useless as far as I’m concerned, but I don’t mind seeing it.

Notes

You should probably include in your script a command to output the bsdlabel of your disks to a text file that will be uploaded along with your dump files.  To do this:

bsdlabel [disk] > /path/to/bsdlabel.txt

On my system, it is:

bsdlabel ad4s1 > /tank/backup/darkhelmet/dumps/bsdlabel.txt

Additionally, it’s a good idea to make a copy of your /etc/fstab in that directory as well.  You’ll need the bsdlabel output to make sure you’re restoring the dumps to the proper partitions, and the backup of the fstab just for added peace of mind.

Schedule It

The last step is to put this script into a crontab.  To keep costs down, I’m not running mine very often – I currently have it set up to run once a week, but I may even change that to once every other week.  My box doesn’t change that much, so there’s no need to constantly have up-to-date backups.

Restoring Your System

To restore your system, you’ll need to boot the system in single-user mode (from the hard disk if possible, or from the install or rescue CD, or a custom boot disk, whatever you want).  You’ll also need to download a copy of your dump files.  My plan is to use another machine to download the dump files, and I will place them on a USB storage device.

For each file system (I’ll show an example using my / filesystem) do the following:

Format the filesystem:

newfs -U /dev/ad4s1a

Mount the new filesystem:

mkdir /mnt/newfs
mount /dev/ad4s1a /mnt/newfs

Mount the USB drive:

mkdir /mnt/usb
mount -t msdosfs /dev/da0s1 /mnt/usb

cd to the partition being restored:

cd /mnt/newfs

Restore our backup:

gzcat /mnt/usb/root.dump.gz | restore -rf -

Un-mount the filesystem

cd /
umount /mnt/newfs
umount /mnt/usb

Finally, reboot

shutdown -r now

Note: I haven’t yet tried this procedure, and I give credit to these steps to a post on the FreeBSD forum.

Next Steps

Once you’ve got this procedure set up, you may want to start tweaking.  I plan on playing with dump(8)’s ability to do incremental backups, as this should save on my S3 bandwidth costs.  As it stands right now, I’ll be uploading a fresh copy of the backup every time, which currently stands at about 1.5GB.  Now, with S3, it only ends up being about $0.15 each time, which would cost me about $0.60 per month for me if I schedule the backup weekly.  If you have lots of users on your box or applications that store lots of data, it might be better for you to perform incremental backups to save on your transfer costs.

If you have other ideas or if I missed anything, please feel free to leave a comment!

Extra Reading

http://www.freebsd.org/doc/en/books/handbook/backup-basics.html