The hardest thing about backups is not preventing data loss but preventing data leakage, either by accident or by targeted attack.
Here are my personal requirements for a backup solution:
- Locally encrypted . Solution must natively support modern authenticated encryption. Files metadata must also be encrypted. Archive structure must ensure it is not possible to reason about existence of specific files in a backup (for example, based on size proximity).
- Offsite . Backing up to external HDD or NAS is a no-go. That wouldn’t protect from malware, theft, fire or confiscation by LE. Backups must be sent off site to a reputable and “privacy sensitive” storage.
- Fully automated . Any requirement for human action makes solution unrealistic — backups would be postponed and eventually abandoned if they add any friction to the otherwise happy day.
- Open source . This is critical to facilitate auditing, including reviewing cryptography soundness.
- Linux native.
- Incremental. It must be very efficient transfer-wise and storage-wise. Low transfer requirements are important because of home internet limited uplink. Low storage requirements are important because cloud storage is expensive and the intention here is to keep a large number of snapshots.
- Historical snapshots. The snapshots must be available and easy to manage and prune. It should be possible to recover individual files from many months ago.
- Root filesystem compatible . Solution must support backing up the whole root filesystem (excluding devices, external file systems, and other irrelevant pieces).
Backups as Attack Vector
Unfortunately, backups are very hard to setup in a safe and privacy preserving way.
The only thing worse than losing your data is leaking your data. Imagine your password manager database, PGP keys, SSH keys, cryptocurrency wallets, 15 years of Thunderbird email history, all leaked through backups.
In case of average Joe that would be his browser history and porn habits that redefine him as a person (and unlikely for the better).
Accidental backups leakage
Example 1: local HDD/NAS backups got confiscated by LE or stolen. One have not employed encryption because, well, the backups were supposed to be stored safely at one’s apartment.
Example 2: same as above + user employed “strong encryption”. Unfortunately, his password is trivial to brute force against the leaked password databases. But even w/o brute forcing the password, user encrypted files but forgot about metadata, thus allowing attackers to easily correlate with known content. Finally, user randomly picked popular AES-CBC, having no idea that this effectively decreases content resolution, making his porn content trivial to understand.
Example 3: user picked popular commercial-grade solution like Crashplan, Acronis or Carbonite. However, user went with the default or easiest path and have not configured local encryption. All her data leaked to the cloud and are impossible to delete reliable. Hundreds of current and future employees can access the data.
Example 4: same as above + user turned on “strong encryption” in the software. User couldn’t make much mistakes because all complexity is handled by the software. Some years down the road it turned out the software encryption implementation was severely flawed — as it is most often the case(!). All users data is freely available in the cloud. The software was never really openly audited because it is proprietary.
The “unlikely” event where a government kindly asks Crashplan/Acronis/Carbonite to introduce a subtle flaw via its software updates mechanism either for all or for specific users. The flaw may simply send the local encryption key to company’s servers as part of the backup.
Beginner’s guide to encryption
Make no mistake: these companies state explicitly in their ToS they will share your data with government when required by law. They also don’t offer a warrant canary!
The Solution: Borg Backup + Borg Helper + Rsync.net
Borg Backup is a proven, mature and reliable open source software for making backups. It is Linux-first and ticks all check-boxes by offering local authenticated encryption, advanced de-duplication, compression, efficient remote backups, snapshots, auto-pruning, smart exclusions, etc.
Borg developers appear to understand intricacies of modern encryption and typical pitfalls. They committed a whole comprehensive chapter of documentation for potential attack vectors and implemented mitigations .
I did a quick peek at the code and found no obvious problems with the encryption, although I am nowhere close to a cryptographer, and review was nowhere close to exhaustive:
- it relies on AES-256 implementation from OpenSSL because AES is not present in Python standard library; borg authors apparently did not want to use external modules like pycrypto; this looks like a good conservative choice; borg statically links to OpenSSL’s libcrypto
- CTR mode of operation; for CTR the requirement for nonce is to never repeat for the same secret key; borg implements this via NonceManager which stores used nonces both locally and in the remote repository
- it uses recommended encrypt-then-MAC approach with either SHA256 or Blake2 (up to the user)
- the SHA256 is from Python standard library (hashlib); Blake2 is the reference C implementation by one of Blake2 authors Samuel Neves
- the MAC is calculated over aad + iv + ciphertext which is what one would expect
Summing up, borg appears to take privacy very seriously.
Borg is super flexible but it does not cover “the last mile” some users would expect — things like periodic running, default opinionated setup, and desktop notifications integration. This last mile is covered by the Borg Helper .
Borg Helper is born out of my own research, choices and setup. It is a thin convenience layer over Borg. It adds periodic running and verification via systemd timers. It also offers opinionated borg configuration for pragmatic remote backups with focus on security and efficiency. It is a handful of short scripts that you can easily audit and modify to your needs:
We still need a remote storage for our backups with server-side support for Borg.
Rsync.net is a very interesting choice. It seems very conservative regarding security and privacy, runs on FreeBSD and offers a weekly warrant canary.
The price is also reasonable with $0.02 per GB per month. Rsync.net proved reliable and fast in my experience.
The big minus is that one cannot use Monero to pay for rsync.net anonymously to decouple data from identity.
All in all though, rsync.net seems the most secure among popular borg-servers.
No setup is perfect. Some remaining issues to be aware of (many of them not specific to borg):
Must backup your decryption key separately . The root partition backup is as complete as one could get. But to unlock the backup you need AES-256 secret key. This key should be kept in your password manager. You must back it up independently of your borg backup. One way to do it is to store encrypted PM database in a bank safe deposit box and/or in your wallet. Of course this is not specific to borg — it’s common to any encrypted backup solution — you need to store decryption key somewhere.
Borg is a SPOF in terms of security — as many other components in your system, BTW. Borg is assumed trusted software that is ran as root to backup the root partition. Should borg ever be compromised the whole scheme breaks. The only mitigations are 1) Borg is open source and quite popular 2) Borg is officially packaged and packages are signed (at least on Arch Linux).
Above filesystem backups are inherently inconsistent . As with any above-the-filesystem backup solution, backups can not be guaranteed to be consistent. This is because backup event is not a point in time, but rather a process carried on while underlying files are potentially modified. In practice, this is most evident with VM images if VM is running while its drive is being backed up. The only definitive solution would be to use filesystem snapshotting feature to create a point-in-time version of reality. Unfortunately, ZFS support for Linux is still far from stellar. There are also other trade-offs to the below-the-filesystem backup approach. The workaround is to have more frequent backups — very often the previous snapshot will have the data in question in a consistent state (unless you run your VM-s or DBs 24/7 on desktop).
Won’t protect against targeted malware . This setup is safe against generic malware that — for example — encrypts all network drives to demand Bitcoin . The borg repository is not a network drive; it’s a custom protocol over SSH. However, should the malware be written specifically to access resources over SSH, the backup could still be removed or re-encrypted by malware. The mitigation would be to further copy (not sync!) the repository to non-SSH cloud storage — but that is increasing complexity and costs — so I selected to accept the trade offs as they are.
The future is probably not going to get better, with real-life disasters caused by internet-connected knick-knacks , smart home robots that could kill you , and your telecom providers who routinely lose customer data and unwittingly help hackers steal your phone number (and sometimes your money.) Meanwhile, an ever-growing and increasingly passive surveillance apparatus that has trickled down to state and local police is an ever-present threat to our digital privacy and increasingly uses technology that is developed by Silicon Valley giants who are supposedly consumer-focused.
Rsync.net is assumed semi-trusted. The setup also relies on rsync.net to be available and not remove the data. The setup also relies on rsync.net to never reveal the data as a second layer of privacy on top of encryption (because the encryption could be potentially broken).
Backups are hard. Privacy preserving backups are even harder.
This article presents a complete and ready to use recipe for privacy preserving personal backups on Linux: Borg + Borg Helper + Rsync.net.
Author is not affiliated with Rsync.net and links are not referral.