Apart from using Mercurial for version-control I found a few cases, where it is fairly suitable as a ... backup tool.
I am maintaining some MoinMoin wiki. MoinMoin saves its data using the disk files. I won't delve into all details now, but my wiki instance is kept in a directory with subdirs for configuration data, runtime scripts, plugins and - finally - the wiki data - pages written by users, user preferences, and such.
My initial backup procedure was simple - just
tar the directory, compress and copy to the remote machine. Slightly troublesome and ... giving only the last version for possible restore. Plus noticeable transfer every time new base backup is made.
Mercurial provided very nice alternative:
I just created Mercurial repository inside my wiki directory, added whole wiki data to it, commited, and configured cron job to regularly
addremove all files, and commit. Cron job running on the backup machine pulls from this repository, to grab updates.
Not only I am keeping up to date, trivially restorable remote backup without hurting my traffic quota (only new changes are copied and they are tiny comparing to the whole content), but also:
- I can revert to the state at any date (since I started to apply this procedure),
- I can easily see what has changed without using the wiki interface,
- I feel safe with respect to MoinMoin upgrades - I can branch to test the upgrade in separate location, and switch users to the migrated version once I am happy with it (and I can always revert if I need to).
All that at minimal disk cost,
.hg subdirectory adds only about 80% to the previous dir size (on the
backup machine it just takes this 80% as I do not need to checkout actual files).
Sweet, isn't it? And this technique works great in many situations (including versioning and backing up
/etc). Below some technical details for the MoinMoin case.
Install Mercurial (from package, or by
easy_install Mercurial) - if there are some symlinks in the directory, 0.9.4 or newer is recommended (on both - the production and the backup machine).
Go to the wiki data directory and execute
master$ hg init
It will create
.hg subdir there. In (unlikely) case wiki data dir parts are accessible via web (served via apache, for example), make sure this dir is not.
.hgignore file, putting there the files which do not need to be backed up. Something like:
syntax:regexp ^(data|underlay)/pages/[^/]+/cache/ ^data/(event-log|expaterror.log|error.log)$ \.py[co]$
Of course tune it according to your directory structure. Above I omit cache directories, less useful logs, and compiled python files (for plugins code). Version controlling them does not harm, but they are not needed and sometimes huge (logs). If you hate regular expressions, use
syntax: glob and shell-like wildcards (
data/pages/*/cache and so on). Test this file by
hg status (files to be ignored should not be reported).
Now let's add files to the repository:
master$ hg add master$ hg commit -m "Initial import"
Some status checking commands for curious:
master$ hg status master$ hg log master$ du -sm .hg
Make some wiki edit and try
hg status to see the modifications.
Initial copy (clone)
OK, let's start copying the data to the remote machine. We need some ssh access, either from the production machine to the backup, or from the backup to the production. Whichever direction you pick, configure
authorized_keys so something like
ssh another.machine ls works without the password prompt.
Now let's make the initial clone. If we can ssh from the backup to the master, it will be:
backup$ hg clone --noupdate ssh://firstname.lastname@example.org//var/lib/wikidir
(note two slashes after machine name, this means absolute path, single slash would treat the rest as path relative to the wiki user home directory). It will create the directory named
wikidirin the current directory (it can be renamed or moved if needed). Let's also immediately create the file
.hg/hgrc containing something like:
[paths] master = ssh://email@example.com//var/lib/wikidir
This is just an alias, thanks to it we will be later able to use
hg pull master instead of
hg pull ssh://firstname.lastname@example.org//var/lib/wikidir.
If we can ssh from the master to the backup, we proceed similarly:
master$ hg clone /var/lib/wikidir ssh://email@example.com//backup/wiki
Patch it to your needs, this example creates the new directory named
/backup/wiki. As above, note two slashes, if you replace them with one,
$HOME/backup/wiki will be created. And again, make an alias in
[paths] backup = ssh://firstname.lastname@example.org//backup/wiki
Whichever command you used,
/backup/wiki should now contain only
This is probably preferable for backup, but would you want to unpack the wiki files, just
backup$ cd /backup/wiki backup$ hg update
(you can safely remove them later with brutal
rm -rf *, just leave
.hg). Whether they are unpacked, or not, you can try commands like
backup$ cd /backup/wiki backup$ hg log backup$ hg status
So far we just imported and copied single version. Let's now configure our system to commit the changes. This is just a matter of writing the following shell script:
#!/bin/bash cd /var/lib/wikidir hg addremove hg commit -m "Automatical backup"
and configuring it to be run at regular intervals (I do it every early morning) - from cron. Remember to configure it so it runs from the files (and repository) owner account. People preferences vary (anacron, cron.d, ....), here is the simple and safe solution:
$ sudo -u wikiuser crontab -e
4 5 * * * /path/to/wikibackupscript
The Mercurial commands used above have the following meaning:
addremove marks newly added files to be added by the next commit, and marks deleted files (on disk) for removal.
commit performs the actual commit to the repository.
The final step is to ensure updates are copied between machines.
If we are
ssh-ing from the backup to the master, we just need to run
hg --cwd /backup/wiki pull master
at regular intervals.
Obvious solution is to run it from cron (on the backup machine, using account which owns
/backup/wiki and is able to ssh to the master). One would usually configure it to run an hour or two after
the commiting script runs.
pull command grabs all new changesets from the remote repository (master) and saves them to the local
pull --update if you want to update checked out copy to the newest version.
If we are
ssh-ing from master to backup, the simplest method is to extend the already written wikibackupscript (the one which addremoves and commits) with
hg push --force backup
Alternatively, spawn from cron
hg --cwd /var/lib/wikidir push --force backup
Nothing forces you to stick with just two repositories. Would you like to, you can create more clones (cloning either from production, or from backup, as you like, pulling automatically or manually) - for example, to test MoinMoin upgrade procedure, or to run development copy for plugin testing.
Having repository version-controlled makes one feel far more comfortable while reconfiguring, rearranging, or upgrading - would anything go wrong, it is always possible to go back.
The same procedure is probably implementable with Git, Bazaar, or Darcs. I recommend Mercurial as it's pull/push commands are not attempting any merges, are not updating the destination dir (unless asked to), and just are not doing anything except copying the new changesets - so they are safe to be used from cron.