Today I'm making a change in the Pinboard terms of service regarding backups.
For the first few years of running the site, my backup policy was pretty simple - dump the full database every night, and upload that file to S3. That worked fine while Pinboard was small, but it's not a good solution now that Pinboard is medium-sized.
The full database right now takes up 153 GB of disk space, and compresses down to a 17 GB backup file. The site also stores eight terabytes of crawled page content, which compresses down to just about half size.
When data gets big like this, the real problem is not where to keep it, but how to move it.
Over time, I've dealt with this by storing more backups on machines that I control, across multiple datacenters, with an occasional long upload to Amazon for redundancy. But as backing up gets more complicated, it also gets further away from the simple procedure outlined in the TOS.
So I want to make the language a little less specific, in order to communicate the fact that I'm making (and testing!) backups without tethering myself to a specific implementation.
The current terms of service say:
Your bookmarks will be backed up nightly to an off-site datacenter.
And I'm changing them to read:
Your data will be backed up regularly, and the backups tested.
The switch from 'bookmarks' to 'data' is also significant, since people are using Pinboard to archive their tweets, notes, and those eight terabytes of crawled content, all of which are also backed up.
I would like to emphasize that nothing is changing today about the way I actually *do* backups. There are a number of data loss scenarios that I have tried to protect the site against:
I do something stupid (like deleting half the database with a typo)
A user makes a mistake (like deleting a whole mess of bookmarks)
There is a catastrophic hardware failure in one data center
A big earthquake destroys both hosting facilities
The FBI comes and takes all my servers again
Silent data corruption goes unnoticed for a while and makes the database unusable
A terrible person gets access to a Pinboard server and tries to do as much damage as possible.
I drop dead.
The first three are easy to guard against with nightly snapshots, stored in multiple places. The last five (particularly the malicious case) require more effort, including driving back and forth every once in a while to San Jose with a storage appliance in the trunk (and my seatbelt fastened). Someday, when I am less afraid of tempting fate by talking about this stuff, I'll write about the way backups work in detail.
Please remember, though, that there's nothing better than a backup you make yourself. Take a moment to visit the Pinboard export page and grab your bookmarks every once in a while. They're in a format that should import cleanly into one of my many competitors. Best of all, since those sites are free, you can test your backups right now. Go do it!
—maciej on March 04, 2013
Pinboard is a bookmarking site and personal archive with an emphasis on speed over socializing.
This is the Pinboard developer blog, where I announce features and share news.
How To Reach Help
Send bug reports to bugs@pinboard.in
Talk to me on Twitter
Post to the discussion group at pinboard-dev
Or find me on IRC: #pinboard at freenode.net