For some time now I have been looking for a backup software for my server. I had a few major requirements:
- integration with Amazon S3 (I wanted to storage backup in one of my S3 bucket)
- secure encrypted backup (My backup data contained all databases, and website configurations so this was important matter)
- incremental backups (I had many GBs of data, so it should backup only data that changed)
Finally, after comparing a few different programs, I chose Duplicity.
Duplicity is an opensource software, that uses the rsync algorithm so only the changed parts of files are sent to the archive when doing an incremental backup. It also implements a number of transfer protocols/file servers such as: FTP, SCP, SSH, WebDAV, IMAP, Google Drive, Tahoe-LAFS, Amazon S3, Backblaze B2 and much more.
Duplicity can encrypt data before uploading them to the archive using a GnuPG key. It’s also extremely easy to use. You can just type:
duplicity [source_path] [target_path]
But, let’s return to the beginning.
I used Centos 7.4 x86_64. Duplicity package is in epel repository (you can install epel repository using: sudo yum install epel-release):
sudo yum install duplicity
How to use Duplicity
duplicity [full|incremental] [options] source_directory target_url
You can specify if Duplicity should use full or incremental backup. Otherwise the program will choose a better way by itself.
Several important options:
--include [shell_pattern] includes the file or files matched by shell_pattern,
--exclude [shell_pattern] similar to above, but excludes matched files/folders instead,
--name [symbolicname] sets the symbolic name of the backup being operated on,
--copy-links resolves symlinks during backup,
--encrypt-key [key-id] id of GPG key. The key-id can be given in any of the formats supported by GnuPG.
I also used a few options which are related to Amazon S3:
--s3-use-new-style uses new-style subdomain bucket addressing
--s3-european-buckets uses european S3 buckets (choose this option if you want to use S3 bucket in eu-west/eu-central location)
Here is a few examples of a backup:
duplicity /some/path sftp://email@example.com/some_dir duplicity full /some/path cf+hubic://container_name duplicity --exclude /mnt --exclude /tmp --exclude /proc / file:///usr/local/backup duplicity /some/path webdav://user:firstname.lastname@example.org:port/some_dir
To restore entire backup just switch source_directory with target_url e.g.:
duplicity sftp://email@example.com/some_dir /some/path
If you want to restore only one file/folder use
--file-to-restore option e.g.:
duplicity --file-to-restore subdir/file sftp://firstname.lastname@example.org/some_dir /some/path/subdir/file
In the above example duplicity restore only file from location
Setting up Amazon AWS
Firstly log in to your AWS console. Go to Storage > S3 and create new bucket. Duplicity can also create new bucket, but if you do it yourself you can choose whatever region you want.
Next, you should create individual account to isolate access to your account. Go to Security, Identity & Compliance > IAM. Click on Users, and choose Add user button. Type user name and set the AWS Access type to Programmatic access only.
On the next step, attach existing policies directly choosing e.g.
AmazonS3FullAccess. Check the whole settings on review and go to the last step. After you successfully created the user you can view and download user security credentials (
Access key ID and
Secret access key).
Generating GPG keys
Duplicity need a pair of GPG keys, if you want to use encrypted backup. I am going to store passphrase directly in backup scripts so it is recommended to generate a dedicated pair of keys only for duplicity. To do this I used
gpg --gen-key command:
[p0n3@hadron sample]$ gpg --gen-key gpg (GnuPG) 2.0.22; Copyright (C) 2013 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Please select what kind of key you want: (1) RSA and RSA (default) (2) DSA and Elgamal (3) DSA (sign only) (4) RSA (sign only) Your selection? 1 RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) 2048 Requested keysize is 2048 bits Please specify how long the key should be valid. 0 = key does not expire = key expires in n days w = key expires in n weeks m = key expires in n months y = key expires in n years Key is valid for? (0) 0 Key does not expire at all Is this correct? (y/N) y GnuPG needs to construct a user ID to identify your key. Real name: Sample Name Email address: email@example.com Comment: You selected this USER-ID: "Sample Name <firstname.lastname@example.org>" Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O You need a Passphrase to protect your secret key. We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. .............+++++ gpg: key [key_id] marked as ultimately trusted public and secret key created and signed. gpg: checking the trustdb gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u pub 2048R/[key_id] 2017-12-31 Key fingerprint = [cut] sub 2048R/[cut] 2017-12-31
Remember: if you lose your passphrase or key file then you lose access to your backups. So it is important to export/backup your key (use
gpg --export-secret-key command).
Create a backup to S3 using GPG keys
Now let’s combine everything together:
export AWS_ACCESS_KEY_ID=[aws_access_key_id] export AWS_SECRET_ACCESS_KEY=[aws_secret_key_id] PASSPHRASE="[passphrase]" duplicity --s3-use-new-style --s3-european-buckets --exclude /home/sample/logs --name backup_sample --encrypt-key [key_id] /home/sample s3://s3-eu-west-1.amazonaws.com/p0n3-sample-backup
If everything went well, you should see:
Synchronizing remote metadata to local cache... Last full backup date: none No signatures found, switching to full backup. --------------[ Backup Statistics ]-------------- StartTime 1514766005.94 (Mon Jan 1 01:20:05 2018) EndTime 1514766104.99 (Mon Jan 1 01:21:44 2018) ElapsedTime 279.05 (3 minutes 39.05 seconds) SourceFiles 80063 SourceFileSize 15523299677 (14.5 GB) NewFiles 80063 NewFileSize 15523299677 (14.5 GB) DeletedFiles 0 ChangedFiles 0 ChangedFileSize 0 (0 bytes) ChangedDeltaSize 0 (0 bytes) DeltaEntries 945 RawDeltaSize 7498549 (7.15 MB) TotalDestinationSizeChange 4338347 (4.14 MB) Errors 0 ----------------------------------------------------------
If you change something in
/home/sample/* files, and run command again, you see that duplicity uses now incremental backup.
Let’s look at AWS S3:
Set up automatic daily backups
OK, Now I’ll create bash script for daily backups of
[root@hadron ~]$ mkdir backups [root@hadron ~]$ cd backups [root@hadron backups]$ nano backup_www.sh
#!/bin/bash export AWS_ACCESS_KEY_ID=[aws_access_key_id] export AWS_SECRET_ACCESS_KEY=[aws_secret_key_id] AWS_BUCKET_NAME=p0n3-www-backup PATH_BACKUP=/var/www PASSPHRASE="[gpg_passphrase]" duplicity --s3-use-new-style --s3-european-buckets --exclude /var/www/rails-app/log --name backup_www --encrypt-key [gpg_key_id] $PATH_BACKUP s3://s3-eu-west-1.amazonaws.com/$AWS_BUCKET_NAME
Backup script contains ours AWS_SECRET_ACCESS_KEY and GPG passphrase so I highly recommended to change permission:
[root@hadron backups]$ chmod 700 backup_www.sh
Additionally, you can create script to restore file/folder. It’s better to do it now than in a crisis situation 😉
[root@hadron backups]$ touch restore.sh [root@hadron backups]$ chmod 700 restore.sh [root@hadron backups]$ nano restore.sh
I wanted to use several backup configuration, so my script has 2 parameters. First for backup name, and second for file/folder to restore. Second parameter is optional (in that case i want to restore entire backup).
#!/bin/bash export AWS_ACCESS_KEY_ID=[aws_access_key_id] export AWS_SECRET_ACCESS_KEY=[aws_secret_key_id] AWS_BUCKET_NAME=p0n3-backup-$1 RESTORE_PATH=/root/backup/restore FILE_TO_RESTORE= if [ $# -eq 2 ] then # single file/folder restore FILE_TO_RESTORE=--file-to-restore $2<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span> RESTORE_PATH=$RESTORE_PATH/$2 fi PASSPHRASE="[gpg_passphrase]" duplicity --s3-use-new-style --s3-european-buckets --name backup_$1 $FILE_TO_RESTORE s3://s3-eu-west-1.amazonaws.com/$AWS_BUCKET_NAME $RESTORE_PATH
[root@hadron backups]$ /restore.sh www domains/p0n3.net/file
Schedule a backup
You can add an entry to the
/etc/crontab or add script to
/etc/cron.daily. However I prefer to use separate file in
[root@hadron backups]$ nano /etc/cron.d/backup
Let’s assume that you have 3 backup configurations. It should run at 1:10, 1:20 and 2:20 everyday.
SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root 10 1 * * * root /root/backup/backup_db.sh 20 1 * * * root /root/backup/backup_www.sh 20 2 * * * root /root/backup/backup_mail.sh
And that’s it. Duplicity is extremely easy to use. After a few minutes of configuration you have a daily backup and you can restore any file when you want. By the way: I strongly recommend to export/backup bash scripts that you wrote in case of emergency.