RSS

Pinboard Blog

« earlier later »

The Matasano Crypto Challenges

I recently took some time to work through the Matasano crypto challenges, a set of 48 practical programming exercises that Thomas Ptacek and his team at Matasano Security have developed as a kind of teaching tool (and baited hook).

Much of what I know (or think I know) about security has come from reading tptacek's comments on Hacker News, so I was intrigued when I first saw him mention the security challenges a few months ago. At the same time, I worried that I'd be way out of my depth attempting them.

As a programmer, my core strengths have always been knowing how to apologize to users, and composing funny tweets. While I can hook up a web template to a database and make the squigglies come out right, I cannot efficiently sort something for you on a whiteboard, or tell you where to get a monad. From my vantage point, crypto looms as high as Mount Olympus.

To my delight, though, I was able to get through the entire sequence. It took diligence, coffee, and a lot of graph paper, but the problems were tractable. And having completed them, I've become convinced that anyone whose job it is to run a production website should try them, particularly if you have no experience with application security.

Since the challenges aren't really documented anywhere, I wanted to describe what they're like in the hopes of persuading busy people to take the plunge.

You get the challenges in batches of eight by emailing cryptopals at Matasano, and solve them at your own pace, in the programming language of your choice. Once you finish a set, you send in the solutions and Sean unlocks the next eight. (Curiously, after the third set, Gmail started rejecting my tarball as malware.)

Most of the challenges take the form of practical attacks against common vulnerabilities, many of which will be sadly familiar to you from your own web apps. To keep things fun and fair for everyone, they ask you not to post the questions or answers online. (I cleared this post with Thomas to make sure it was spoiler-free.)

The challenges start with some basic string manipulation tasks, but after that they are grouped by theme. In most cases, you first implement something, then break it in several enlightening ways. The constructions you use will be familiar to any web programmer, but this may be the first time you have ever taken off the lid and looked at the moving parts inside.

Here are the cryptographic topics covered:

Going into the challenges, I worried that my math wouldn't be up to the task. My impression of Serious Crypto was that it required all kinds of group theory, abstract algebra, elliptic curves, vector spaces, and other scary stuff. But while this may be true, the math content for the practical challenges was much gentler:

While the math concepts weren't hard, getting a real feel for them took work (and this was the point of the exercise).

If you're an experienced programmer, the Matasano challenges are also a terrific excuse to try a new programming language. It's always much more fun to solve real problems than it is to write a Manager object that inherits from Employee.

Here are the language features I found myself using most:

  • string manipulation (ranges, substrings)
  • bitwise operators
  • lookup hashes
  • conversion between string and number formats
  • big integer operations
  • packing and unpacking binary data
  • pattern matching
  • url manipulation
  • client/server interaction over a socket

Altogether it took me about three weeks to do the full cycle, working pretty intensively. Skilled programmers will find the going much faster, especially if you're comfortable with bit twiddling. Very few of the problems were downright hard, though some required several hours of work. I spent most of my time stepping through algorithms in pursuit of bugs, and in the process really got a feel for the moving parts in various cryptographic constructions.

I would compare the experience to having only ever read cookbooks and watched cooking shows, and then being asked to fry an egg. You know exactly what to do... in principle.

Some of the challenges have a payoff, in that you decrypt a short bit of secret text. This is incredibly fun. Seeing a cracked message come up on the screen after an evening of bug chasing reminded me of how it felt to be a kid in front of my Apple ][, finally getting it to beep or draw a circle or print DONGS all over the screen. Some of the later challenges even display the answer 'Hollywood style', where you get to see it decrypt one letter at a time in a cascade of print statements.

While the rules don't stipulate it, I think it's a good idea not to look at anyone's code if you try the challenges. The goal here is to convert message-board levels of understanding into actual knowledge, and the only way that works is if you bang your head on the task without seeing how anyone else has done it. Sean was really helpful in helping me navigate difficult spots, and the challenges are not set up to intentionally trick you. But you will need the kind of graph paper with the small squares.

What surprised me most:

  1. How practical these attacks were. A lot of stuff that I knew was weak in principle (like re-using a nonce or using a timestamp as a 'random' seed) turns out to be crackable within seconds by an art major writing crappy Python.

  2. There is no difference, from the attacker's point of view, between gross and tiny errors. Both of them are equally exploitable. In at least three challenges, the mere fact of getting distinguishable error messages was enough to recover the entire message.

  3. This lesson is very hard to internalize. In the real world, if you build a bookshelf and forget to tighten one of the screws all the way, it does not burn down your house

  4. Timing attacks are much more effective than I imagined.

  5. Someone who can muck with your ciphertext is halfway to reading it, possibly with your secret key for dessert.

  6. Some mistakes are incredibly non-obvious. I had no idea you had to super-carefully pad RSA, for example.

  7. Even on a laptop, in 10 minutes you can do a terrifying amount of computation. It really is 2013.

I mentioned earlier that I thought every web programmer should try their hand at these. It is very illuminating to look at your own web app from the vantage point of an attacker actually writing code. At the very least, you will never be confused about cipher block modes again, or have to worry that someone will ask you to explain how a public key works in an interview. And there is a whole slew of dumb mistakes you will now avoid (replacing them with smarter mistakes that will become the subject matter of challenges 48-96).

The best part, from a web app developer's perspective, is that you never once write a SQL statement or HTML tag.

Here are some specific lessons from the challenges that I will apply to my own work:

  1. Keep meaningful data out of tokens (like cookies) that I hand out to clients. Use random values keyed against a database, memory store, or wherever.

  2. If I have to put data in tokens, include an integrity check, and pay a real crypto person to vet it.

  3. I must never seed a PRNG with a timestamp. I used to do this with microsecond precision thinking I was being clever. Then I went ahead and wrote a script that guessed the seed value in just a few seconds, and now I will never do that again.

  4. Use constant-time string comparisons when testing incoming data against some target value for authentication purposes. This is easy enough to do in most languages to make it cheap insurance.

  5. Anything related to authentication should only fail in one way. I must not provide distinguishable errors to the user.

  6. If possible, find a way to log the fact that someone is making a lot of weird queries against my site. For extra points, try not to make the logger itself hackable.

  7. No third-party javascript. I hated it already, now I hate it more.

  8. Cut off one of my fingers each time I re-use a nonce.

Having read this post, you can go to Hacker News and comment in Talmudic detail about what is right or wrong in the conclusions I drew. But a much better idea is to just email Sean and have a crack at the challenges yourself. You will have a good time!

One final observation. Crypto is like catnip for programmers. It is hard to keep us away from it, because it's challenging and fun to play with. And programmers respond very badly to the insinuation that they're not clever enough to do something. We see the F-16 just sitting there, keys in the ignition, no one watching, lights blinking, ladder extended. And some infosec nerd is telling us we're can't climb in there, even though we just want to taxi around a little and we've totally read the manual.

Doing these challenges is a great way to 'shake your sillies out', as Raffi might say, without hurting yourself or your users. You get to put on the flight suit, climb into the simulator, and crash that plane in every conceivable way.

I would like to sincerely thank Thomas and Sean and everyone at Matasano who worked on these challenges, and implore people in other technical fields to consider offering something similar. It's the most fun I've had programming in years!

—maciej on April 18, 2013



Little Green Locks

This weekend I made some changes to Pinboard that should make it safer to use the site on public networks, like in a library or cafe, without the risk of anyone snooping on you.

Pinboard now forces all users to connect via a secure connection. This was already the default for over half of Pinboard users who had set the 'secure connections' preference, but from today it applies to everyone who is logged in.

A small number of you may see a login screen the next time you use Pinboard. This will only happen once. Most people will not notice any changes to the site at all.

If you find something has broken for you, or you are getting security-related errors in your browser, please email me a bug report and screenshot to support@pinboard.in.

—maciej on April 15, 2013



Tag Editing, Network Filters and Fandom Search

This weekend I pushed out three new features. Since they have yet to take down the site, it's probably safe to announce them:

Tag Editor

A persistent shortcoming of Pinboard is that there's no way to do tag gardening. In particular, it has not been possible to remove lots of tags at once, view a list of tags that you've only used once, or search within your tags by substring.

The tag editor is here to help. You can load it by clicking the 'manage' link next to your tag cloud. The editor has a number of features, so I encourage you to take a quick look at the documentation before you give it a spin.

As with many tag-related endeavors, it can be hard to strike a balance that works for both the average user (with 150 unique tags) and the tagging elite (some of whom have over 10,000 tags). Please let me know if I did someting wrong.

Right now, the editor will only let you delete tags. I'm working on additional actions, like rename and merge.

Network Filters

Another long-standing criticism of the site is that auto-added links (mainly those from Twitter) make the network feature unusuable.

For some people, Twitter links on the network page are pure noise. For others, Twitter is an essential part of getting links into Pinboard. So instead of imposing my view on everyone, I've added a filter link to the network page so you can decide for yoursel. Turn the filter on and you won't see any more bookmarks that originated from Twitter in your network feed.

Unfortunately, it's not possible to set this kind of filter on a per-user basis in your network.

Fandom Search

For months now it's been possible to declare yourself a part of fandom on the Pinboard settings page, but apart from making people feel good, the checkbox had no practical effect. I've finally changed that by offering a version of the sitewide search engine scoped only to users who self-declare as fans. This should make it easier to find fic using common keywords that would get drowned out on the main search page.

Note that all three of these features were suggested by clever Pinboard users. Keep those cards and letters coming!

—maciej on March 25, 2013



Search Reform

This weekend I added two new behaviors to Pinboard search:

  1. You can search any user's account by using the search box on their user page.

  2. Similarly, using the search box on any network page will restrict the search to people in that network.

More search reform is on the way, as I work through a small mountain of (very good) search-related feature requests. Please feel free to add to the mountain.

—maciej on March 18, 2013



Privacy Lock

As you saw in the previous post, many Pinboard users choose to keep their accounts completely private. About 12,000 have the 'add everything as private' setting enabled for that reason.

There are two big problems facing private users. The first is that the site is ugly. Private bookmarks have a dark background by default, to make it easy to distinguish them at a glance from public bookmarks. But this design only works well if you have few private bookmarks mixed in with a lot of public bookmarks (by a curious coincidence, my own usage pattern). Otherwise your bookmarks become a busy grey mess.

The second problem is that it's too easy to make a bookmark public by accident, particularly if you use the API. While it doesn't happen often, there's no reason these controls should be there if you don't intend to use them.

Today I'm adding a new feature to address these problems, called Privacy Lock. Privacy Lock will make it impossible to add public bookmarks to your account in any form while the setting is turned on. It will also make private bookmarks appear without a dark background, so the site is easier on the eyes.

In order to turn on Privacy Lock, you'll first have to make sure all your bookmarks are private. You can make them private by using the bulk edit link at the top of your home page.

Once the setting is on, you'll see a little padlock next to your username:

Your bookmarks will have a pretty white background, and you'll see there's no way to make them publicly visible. You can turn Privacy Lock off whenever you like, and your account will go back to its normal self.

You can toggle the new setting in the privacy tab of your settings page.

—maciej on March 05, 2013



Some Numbers

Every once in a while it's fun to take a look and see how people are actually using the site. Here's some numbers I pulled up this morning:

Bookmarks stored68 million
Active users22,400
Known URLs42 million
Data archived last month456 GB
Tweets archived last month1.4 million
Users with only private bookmarks6,700
Users with only public bookmarks10,100
Users with only unread bookmarks396
Users without a single bookmark845
...and who paid for archiving23
Fandom, self-declared2717
Use a non-English version of Pinboard1556

—maciej on March 05, 2013



API Maintenance This Saturday

The Pinboard API will be offline between 14:00 and 16:00 PST this Saturday, March 9, 2013.

I'm installing some new hardware and will have to reboot the server a few times to make sure it's working properly. I'll try to keep the service interruption as brief as possible, and send out updates via Twitter.

Please email me if you have any questions!

—maciej on March 04, 2013



A Word on Backups

Today I'm making a change in the Pinboard terms of service regarding backups.

For the first few years of running the site, my backup policy was pretty simple - dump the full database every night, and upload that file to S3. That worked fine while Pinboard was small, but it's not a good solution now that Pinboard is medium-sized.

The full database right now takes up 153 GB of disk space, and compresses down to a 17 GB backup file. The site also stores eight terabytes of crawled page content, which compresses down to just about half size.

When data gets big like this, the real problem is not where to keep it, but how to move it.

Over time, I've dealt with this by storing more backups on machines that I control, across multiple datacenters, with an occasional long upload to Amazon for redundancy. But as backing up gets more complicated, it also gets further away from the simple procedure outlined in the TOS.

So I want to make the language a little less specific, in order to communicate the fact that I'm making (and testing!) backups without tethering myself to a specific implementation.

The current terms of service say:

Your bookmarks will be backed up nightly to an off-site datacenter.

And I'm changing them to read:

Your data will be backed up regularly, and the backups tested.

The switch from 'bookmarks' to 'data' is also significant, since people are using Pinboard to archive their tweets, notes, and those eight terabytes of crawled content, all of which are also backed up.

I would like to emphasize that nothing is changing today about the way I actually *do* backups. There are a number of data loss scenarios that I have tried to protect the site against:

  • I do something stupid (like deleting half the database with a typo)

  • A user makes a mistake (like deleting a whole mess of bookmarks)

  • There is a catastrophic hardware failure in one data center

  • A big earthquake destroys both hosting facilities

  • The FBI comes and takes all my servers again

  • Silent data corruption goes unnoticed for a while and makes the database unusable

  • A terrible person gets access to a Pinboard server and tries to do as much damage as possible.

  • I drop dead.

The first three are easy to guard against with nightly snapshots, stored in multiple places. The last five (particularly the malicious case) require more effort, including driving back and forth every once in a while to San Jose with a storage appliance in the trunk (and my seatbelt fastened). Someday, when I am less afraid of tempting fate by talking about this stuff, I'll write about the way backups work in detail.

Please remember, though, that there's nothing better than a backup you make yourself. Take a moment to visit the Pinboard export page and grab your bookmarks every once in a while. They're in a format that should import cleanly into one of my many competitors. Best of all, since those sites are free, you can test your backups right now. Go do it!

—maciej on March 04, 2013



UK Meetup Aftermath

The UK meetup drew the biggest crowd yet, with a nearly double-digit turnout. Thanks to everyone for coming!

—maciej on February 21, 2013



World of Pinboard

Slowly making my way home to Poland for a visit gave me a chance to meet a number of Pinboard users along the way. I am grateful to everyone who took the time to join me for the impromptu meetings documented below. I've found that meeting actual human beings who use my site is a valuable weapon in the fight against burnout. But it's been even nicer to discover that some really nice people use Pinboard. This encourages me to try to be a nicer guy myself.

Refusing to stay in focus in Warsaw:

Conquering Lyon with @anatsuno:

Friendly Pinboard people in Paris:

—maciej on February 14, 2013



« earlier later »

Pinboard is a bookmarking site and personal archive with an emphasis on speed over socializing.

This is the Pinboard developer blog, where I announce features and share news.




How To Reach Help

Send bug reports to bugs@pinboard.in

Talk to me on Twitter

Post to the discussion group at pinboard-dev

Or find me on IRC: #pinboard at freenode.net