RSS

Pinboard Blog

A Bad Privacy Bug

On Friday, I made a site update that exposed private bookmarks for about a hundred Pinboard users. Any private bookmarks created through the browser between about 15:45 and 17:30 Pacific time were visible on public areas of the site.

I emailed affected users on Friday evening to inform them of what had happened. If you didn't get an email from me, it means your bookmarks were not affected. But if you have any doubts, by all means get in touch and I will take a look.

After losing data or passwords, this is the about the worst kind of mistake I can make.

The heart of the problem was that it turned out to be possible to ask the Pinboard database to "give me only bookmarks where the privacy flag is set to zero'" and still get back results where the privacy flag was on. This is like accidentally baking something by putting it in your freezer. Unexpected.

In my email to users, I promised to give a more public and technical explanation of what I did wrong:

The proximate cause of the bug was a small change in the 'save bookmarks' page:

      - $b->private   = post("private") ? 1 : 0; 

      + $b->private   = post("private");

In human language, the original line says 'set the private flag on the new bookmark to zero or one, depending on whether there was a 'private' field in the submitted form', and the replacement line says 'set the private flag to either nothing, or the value the user posted'.

I made this change believing that there was code later in the save process that coerced the value to either 0 or 1. There wasn't. That was mistake #1.

So now a new bookmark coming into the system would have $b->private set to 'yes' instead of 1.[*]

That shouldn't have been enough to break things. Mistake #2 was in the way I set up the database, way back in 2009. Here's the database definition for the bookmarks table :

CREATE TABLE bookmarks (  
    id int(11) NOT NULL,
    url mediumtext,      
    title varchar(255),  
    description mediumtext,  
    user_id int(11) NOT NULL,  
    toread tinyint(1) DEFAULT '0',  
    private binary(1) DEFAULT '0',  
    url_id int(11),     
    slug char(20),       
    snapshot_id int(11),  
    code char(3),        
    source smallint(6),   
    added_at datetime,      
    created_at datetime,   
    updated_at datetime,  
    PRIMARY KEY (id),  
) ENGINE=InnoDB DEFAULT CHARSET=utf8;  
   

Notice that 'private' is a binary field of length 1. This was a braino on my part - I was thinking of the fact that I just needed to record a single bit. But the binary column type in MySQL is intended for binary data, not Boolean values. A more correct way to store this kind of flag is as tinyint(1). Notice that the toread flag has this correct type.

An important difference between these data types is how they behave when you put in character data:

 mysql> create table demo(int tinyint(1), bin binary(1));
 mysql> insert into demo(int, bin) values ('y', 'y');
 mysql> select * from demo;

 +------+------+
 | int  | bin  | 
 +------+------+
 |    0 | y    |
 +------+------+

Notice that saving a non-number value in the tinyint column converts it to zero.

This would have let me catch the bug in testing, since a bookmark saved as private would have been displayed as public (the two are visually distinct).

But this is the part that I really did not expect:

 mysql> select bin from demo where bin = 0;

 +------+
 | bin  |
 +------+
 | y    | 
 +------+

So even though the row has (from our perspective) a non-zero value, MySQL sees it as matching zero.

This means that a database query for public bookmarks (which includes a 'WHERE private=0' restriction) will still return bookmarks that have all kinds of values in the private field.

When PHP converts such a row into a Bookmark object, it assigns the privacy status based on the value it gets from the database:


   $b = new Bookmark();
   $res = query('huge sql query here');
   $row = $res->fetchRow();
   foreach ($props as $p) {$b->$p = $row[$p];}

At this point the $b->private instance variable is set to 'y'. Because of how PHP works, any Boolean test of $b->private will now evaluate to 'true', and so we have here the miracle of a truly private bookmark returned from a strictly public query.

So mistake #3 was not trapping unexpected data types at load time.

Now, there are certain pages on the site (like /recent/ and /popular/) where it should never be possible to see private bookmarks. Mistake #4 was not enforcing this restriction in the display template. This extra check exists for private tags, but it did not exist for private bookmarks.

While it would not have prevented this privacy leak entirely, it would have limited the number of pages where private bookmarks could show up.

Finally, mistake #5 had to do with how I test this stuff. Instead of automated test suites, I use checklists before deploying major changes, performing a series of actions (like creating and editing bookmarks) to make sure everything works as expected.

But I do this from my own Pinboard account, which has superpowers. Specifically, it lets me see private bookmarks on all accounts. And since I already see everything, I don't notice when everybody else starts to see everything, too.

tl;dr: because of poor input validation and a misdesigned schema, bookmarks could be saved in a way that made them look private to the ORM, but public to the database. Testing failed to catch the error because it was done from a non-standard account..

There are several changes I will make to prevent this class of problem from recurring:

  1. Coerce all values to the expected types at the time they are saved to the database, rather than higher in the call stack.

  2. Add assertions to the object loader so it complains to the error log if it sees unexpected values.

  3. Add checks to the templating code to prevent public bookmarks showing up under any circumstances on certain public-only pages.

  4. Run deployment tests from a non-privileged account.

Moreover, I'm considering adding special behavior for accounts that are private by default. If you really, truly never want to save public bookmarks, there should be a way to tell the site that in exchange for additional safeguards.

One thing I won't do fix is the database schema. While the field type is incorrect, at this point a whole pyramid of working code sits on top of it, and mucking with it might break other things in unexpected places. Prudence suggests not touching it without lavish preparation.

To everyone who was affected by this bug, or who could have been if the timing was different, I would like to offer my sincere apology. After spending three years seducing the privacy-obsessed, it's pretty low form to suddenly break those guarantees.

While I've tried to compensate affected users, it's no substitute for not allowing these kinds of mistakes in the first place.

I'm around on Twitter as usual if you have questions or things to say.


[*] This is not to say there was no data validation done at all. Incoming data is sanitized in a separate step.

—maciej on July 31, 2012



Pinboard is a bookmarking site and personal archive with an emphasis on speed over socializing.

This is the Pinboard developer blog, where I announce features and share news.




How To Reach Help

Send bug reports to bugs@pinboard.in

Talk to me on Twitter

Post to the discussion group at pinboard-dev

Or find me on IRC: #pinboard at freenode.net