Pinboard Blog

After a week of slogging servers around northern California, I thought a brain dump on colocation might be useful to readers, and to future me.

I wrote about the difference between colocation, leased servers and other kinds of hosting in an earlier post. This one is strictly about colocation, the 'Condo' approach where you own a bunch of hardware and need a place to put it.

What you are after is not complicated:

A physical enclosure
Some kind of Internet connection
Power
Security guards

However, buying it is a pain in the neck.

Money

It's typical to sign a one- or three-year contract. Right off the bat this introduces an element of pressure, since you're making a fairly binding, long-term decision without knowing what you're doing. Unless you live in the Bay Area, colocation is a cost on par with your rent or mortgage.

Worse, I've found that the cost for identical configurations can vary by a factor of three or more. You really do need to shop around. And you need to be especially careful of punitive terms for things like overstepping bandwidth or power requirements. These are things you have to dig out of the fine print of contracts at a moment when you just want the whole process to be over.

Salespeople

Renting colo space feels like buying a car. You typically decide on a specific configuration you want, and then ask for quotes. For the salespeople involved, this is just the start of a long conversation they want to have with you. They'll be very curious about your budget, and want to talk about the hosting equivalent of an underbody clearcoat (various "hybrid cloud solutions") and extended warranty.

Although colo space is a commodity, salespeople become tetchy if you treat it as such. They will insist on talking to you over the phone and bristle at the suggestion that their job could be replaced by a web form. It is a good idea not to think about how much their salary or commission adds to your costs.

The reason I say colo space is a commodity is not because all facilities are the same, but because small-time clients will have no practical way of assessing their quality. There are certainly some facilities that are obviously bad, but most data centers have sane policies, look nice if you visit, and talk eloquently about uptime. The only way to evaluate a data center is to go through a series of small and big outages together, but by then you're already in a multi-year contract. So in practice, there is not enough information to pick a "better" data center, just a bunch of anecdotes on message boards. I believe you're better off getting space in two cheap places than trying to pick one high-end one.

Resellers

A lot of colo space is resold through intermediaries. This is often the only way to get a smaller amount of space than a full cabinet. Someone rents a bunch of cabinets at a data center and then parcels them out by the slice to clients, in pieces as small as 1u.

There are two things to watch out for in this arrangement:

A bunch of people will have access to your physical equipment. Good resellers will take pains to introduce the various people in a cabinet to one another (or at least provide contact info).
If your provider gets in a dispute with the data center, you may not be able to physically take out your hardware. It may even be seized without advance warning.

Power

Power is the great bugbear in hosting. You need to know how much your equipment uses, and how much you're likely to need. It is also often the limiting factor.

A useful rule of thumb in my case has been 1 rack unit = 1 amp. However, it is quite difficult to estimate the power consumption of a server before you buy it. You end up having to plug it in to a Kill-A-Watt meter under normal load.

A full cabinet is somewhere around 42 units, but a typical full cabinet power allotment is 20 amps. So you can't just fill a cabinet with servers.

The power situation is even worse than it sounds. That 20 amp figure is strictly theoretical, since you aren't allowed to use the full amount. You are limited to 80% of this figure, so a "20A" cabinet has 16A usable power, enough for eight pinboard-style servers.

Finding Datacenters

Finding information is another pain in the neck. The place of choice is an awful forum called WebHosting Talk. There is an open business opportunity to anyone who can stick a front-end interface on this site that lets you enter number of servers, bandwidth, physical location, and spit out a list of offers.

Another business opportunity is to make an authoritative directory of Bay Area data centers, since there is a bewildering assortment of sellers, re-sellers, and re-re-sellers offering the same physical space. Conversely, some large providers maintain multiple facilities.

Earthquakes

Nobody likes to talk about earthquakes. But anything in the Bay Area or Seattle is going to come crashing down at some point. Another thing that has proven impossible is finding out what facilities are at highest risk. It's one thing to go offline when the Big One comes (half the Internet will be down with you). But losing a rack full of hardware into the maw of the earth is worse.

So here's my plea to hackers: figure out where stuff is physically hosted, correlate it with seismic hazard maps, and make a nice web form that lets people shop for specific power/bandwidth/space configurations without talking to salespeople. Charge money for it! I will pay you! Others will pay you!

—maciej on August 03, 2013

Seeking Bay Area Colo Space

August 02, 2013

I succesfully moved my backup servers to Sacramento this week, but I'm still looking for a Bay Area colo for the main site, which has outgrown its current home.

Here's what I'm hoping to find:

Half or full cabinet
100 Mbps capped bandwidth
20 A @120 V
One or three-year contract
Full 24/7 physical access, within 100 km of San Francisco

The best offer I have in hand right now is from HE at their Fremont 2 facility, who are asking $600 per month for a full cabinet, but with skimpy power (15A) and a $200 setup fee to install square-hole posts.

Make me a better offer and I'm yours! Email me at maciej@ceglowski.com.

—maciej on August 02, 2013

Upcoming API downtime

July 28, 2013

I've found out on short notice that I must vacate my current hosting facility before the end of the month. This will mean physically moving about eighty pounds of PInboard machinery from San Jose to a new home in Sacramento.

The servers involved run the Pinboard API. I'm going to try switching all API traffic to the web server (which is hosted elsewhere and will not be affected by the move) but if it turns out to be too much load, I will need to take the API offline.

In the pessimistic case, the API will be down from early Monday evening California time until Wednesday morning.

The outage will also affect archiving. About half of PInboard users won't be able to reach their archives during the outage. I will extend affected archiving accounts by one week as compensation.

The website and RSS feeds will remain up and running.

Nothing makes me feel more alive than a midsummer motor car ride through California's gorgeous Central Valley. I apologize to my users for this self-indulgence, and promise not to make it a regular habit.

—maciej on July 28, 2013

Pinboard Is Four Years Old

July 09, 2013

Today marks four years since I opened the creaky gates and started charging customers money for Pinboard.

Here are some site stats for this year, compared with one and two years ago:

	2010	2011	2012	2013
bookmarks	3.5 M	27 M	53 M	76 M
tags	11 M	76 M	135 M	178 M
active users	2.8 K	16 K	23 K	23 K
bytes archived	200 G	3.0 T	5.9 T	8.8 T
downtime	6 h	29 h	22 h	12 h*
unique URLs	2.5 M	16 M	32 M	48 M

* this is site downtime; API downtime was much worse, perhaps 48 hours in all

The site has continued to grow at a steady clip, adding as many bookmarks and tags this year as last year. Total revenue from signups and archiving has been far steadier than I expected from a web project, which by nature tends to be spiky. This comes as a considerable relief, since it means I don't have to hunt for a new brand of champagne or truffle oil every other month. The number of active users has found a steady state, with as many people joining the site as dropping off of it in any given month. Before growing the site much further, I would like to get better at handling support requests at this level.

It's been an active year behind the scenes. On top of the usual code gardening, I spent some time working to better secure the site, introducing API tokens for password-free authentication, moving everyone to TLS without breaking everything, and adding various cookie flags and HTTP headers to make the site a little more resilient to bad people in public places.

The major new features this year were tag bundles, privacy lock, major improvements to search, and the beginnings of a bulk tag editor.

I gave talks on Pinboard at Brooklyn Beta, CUSEC, and InfoShare Gdansk, and in the process got to meet Pinboard users in Osaka, Warsaw, Paris, Lyon, Stockholm, London and Berlin. This was an enormous amount of fun for me, and really helped to get me out of the house.

Finally, I made my first foray into venture capital with the Pinboard Investment Co-Prosperity Cloud. The six winners have been hard at work, and I look forward to writing about what they've achieved in the coming weeks.

Thanks to everyone who has helped me with this project over the past four years, in big and small ways. July 9 is one of the happiest days of my year, and I owe it to all to kind people from across the Internet.

—maciej on July 09, 2013

Persuading David Simon

June 15, 2013

[June 19 update: David Simon has been kind enough to respond at length here. He points out that I falsely stated that collecting call records requires a warrant; I have corrected that statement in the post below.]

I read with interest David Simon's recent blog post in which he responds to revelations that the NSA has been collecting the call records of all American mobile phone users.

David Simon, of course, created the Wire, a television series where institutions take on lives of their own and defy attempts by well-meaning people to reform them from within. So it came as a real shock to find Simon criticizing pundits who have objected to the extent of NSA surveillance, and accusing them of wilful ignorance about the nature of police work.

Mr. Simon pointed out that law enforcement agencies have been allowed to capture call records for decades, including in cases where the information harvested includes calls from people who are not under suspicion. In other words, there's nothing new going on to get worked up about.

Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.

Seeing no difference in principle, only a difference in degree, in the NSA's surveillance program, Simon expresses annoyance with Americans who demand total protection from terrorism and then purport to be shocked when their government takes their requests seriously.

Mr. Simon cites the specific example of an investigation he covered as a police reporter in Baltimore in the 1980's. Criminals were using pay phones and pagers to evade detection, and tracking them down required indiscriminately recording numbers dialed from those pay phones, with the goal of sifting through the data later to find the pager numbers.

He argues that this kind of investigation, which targeted pay phones, was in some ways more invasive than the kind of tracking the NSA is accused of, since people expect to be anonymous when using a pay phone in a way that doesn't apply when they're calling from their own cell.

There is certainly a public expectation of privacy when you pick up a pay phone on the streets of Baltimore, is there not? And certainly, the detectives knew that many, many Baltimoreans were using those pay phones for legitimate telephonic communication. Yet, a city judge had no problem allowing them to place dialed-number recorders on as many pay phones as they felt the need to monitor, knowing that every single number dialed to or from those phones would be captured. So authorized, detectives gleaned the numbers of digital pagers and they began monitoring the incoming digitized numbers on those pagers — even though they had yet to learn to whom those pagers belonged. The judges were okay with that, too, and signed another order allowing the suspect pagers to be “cloned” by detectives, even though in some cases the suspect in possession of the pager was not yet positively identified.

I think Simon's fundamental argument, “same old stuff”, is mistaken in a number of important ways, and that some of this reflects our failure as technologists to communicate what modern surveillance can do.

First, there is the scope of the order. The Baltimore operation, and others like it, were limited to a specific criminal investigation. They were obtained ~~under a warrant~~ under a subpoena setting limits on what would be collected, and for how long.

The NSA program is universal and appears to be open-ended. Information is collected in aggregate. The program operates under the authority of secret court order, not a warrant. It is not clear whether the Administration even believes this type of surveillance requires a court order.

Second is the nature of the body carrying out the surveillance. In Simon's example, this was a municipal police force, overseen by a local court.

In our case, it's the NSA, a Federal agency whose job has traditionally been to collect foreign signals intelligence ①. The operation is overseen by a secret court system called FISC.

Third is the nature of the data being collected. When the Baltimore investigation took place, it collected a simple list of telephone numbers dialed from the monitored phones.

Modern call records contain much more data, reflecting the fact that almost all of us carry cell phones. A call record now includes unique device identifiers, routing information, cell tower IDs, and a wealth of additional information about the circumstances and location of the call. The location data is particularly powerful, turning mobile phones into de facto tracking devices whenever they are turned on.

Fourth is the question of oversight. The evidence used in the Balitmore case was collected by municipal police and presented (I'm assuming) in open court. Those against whom it was used had the chance to mount a defense, appeal the verdict to state and Federal courts, and enjoyed the presumption of innocence guaranteed to them by the Constitution.

The NSA call data is collected and used in secret. The agency is overseen as part of the very large national security establishment by a small, overworked group of legislators and senior government officials who have the requisite security clearance.

So I contend that the parallel Simon makes is false. The NSA is not a law enforcement organization, and intuitions from police reporting don't carry over.

But even if we grant the analogy, I think there's a more dangerous argument in Simon's essay, which is the contention that two programs that differ only in degree are necessarily "the same old thing". I believe this is not a safe assumption to make when talking about computers and their use in domestic surveillance.

In the portion of his essay that excited the most comment, Simon appears to express disbelief that the NSA can make broad use of the data it gathers:

When the government grabs every single fucking telephone call made from the United States over a period of months and years, it is not a prelude to monitoring anything in particular. Why not? Because that is tens of billions of phone calls and for the love of god, how many agents do you think the FBI has? How many computer-runs do you think the NSA can do — and then specifically analyze and assess each result?

Well, of course, the answer is "you would not believe how many 'computer-runs' the NSA can do". I believe this part of the essay especially caught tech people's attention, since it suggested that Simon might be naive about the capabilities of a modern datacenter. It's certainly the part Clay Shirky pounced on in his rebuttal.

But Simon is not a fogey who doesn't understand how powerful computers have become (though I feel that there are such people in positions of oversight in the House and Senate). I believe his error is in assuming that the analysis of these 'computer-runs' is any kind of bottleneck. There are powerful techniques for surfacing interesting features in any comprehensive list of interactions between human beings. I've written in the past about my distaste for the 'social graph' and the perverse worldview it imposes on our projects, but part of the appeal of that worldview is the real power of mathematics applied to exactly this kind of data. The analysis can be automated, and no good comes of it.

In a beautiful worked example, Kieran Healey has shown how a precocious British intelligence service could have identified Paul Revere as a person of particular interest based only on a set of membership lists of organizations he belonged to.

The point is, you don't need human investigators to find leads, you can have the algorithms do it. They will find people of interest, assemble the watch lists, and flag whomever you like for further tracking. And since the number of actual terrorists is very, very, very small, the output of these algorithms will consist overwhelmingly of false positives.

It's at this point that Simon's logic starts to work in the other direction. Given a long list of potential leads, investigators are going to focus on vetting the most likely, rather than taking any steps to clear false positives out of system. The penalty for missing a real terrorist is catastrophic, while the penalty for falsely accusing someone (when not only the accusation, but the very existence of the program, is secret) is nonexistent, even if the secret accusation ends up doing real harm. Limits on manpower won't constrain the investigation; they will only reduce its overall quality.

This isn't an abstract argument. We are all familiar with the tenebrous no fly list, a document that prevents several thousand people from traveling by air, and condemns thousands more to intrusive security measures each time they want to get on a plane. After 2001, this list rapidly expanded to thousands of names, with no avenues of appeal and no way to even check whether your name appeared in the document, to the point where the government finally had to improvise a 'redress' policy for travelers who found themselves living out a Kafka novel.

Characteristically, proposals for fixing the no-fly list and similar watch lists now call for collecting even more information, to help disambiguate people who share a name but not a date of birth with someone on the watch list. The basic problem—that lists of suspects are generated without accountability, without oversight, and with no incentive to avoid mistakes—persists.

There's also a more dangerous institutional problem to consider. When a system like this exists, it creates pressure for its own use. What is the point, after all, of having a very elaborate, extremely expensive ② database if you are only ever going to use it in exceptional cases? It is the nature of law enforcement to want to go after bad guys with all available tools. We saw a vivid demonstration of this in the years after the 2001 attacks, when the administration attempted to blur the lines between the 'War on Drugs' and the 'War on Terror', arguing that the proceeds from narcotics sales paid for terrorism.

Consider, too, a technique that has become standard in Federal investigations. It is a felony to make false statements to a Federal agent, and investigators routinely make use of this fact to gain leverage over a witness or suspect. People tend to be nervous when they talk to police, and unless they know better are liable to give inconsistent answers during questioning. Good interrogators can convert each of these inconsistencies into a felony count. Imagine how much more potent this tactic becomes when investigators can gain access to a database of your movements and contacts for the past decade.

The security state operates as a ratchet. Once you click in a new level of surveillance or intrusiveness, it becomes the new baseline. What was unthinkable yesterday becomes permissible in exceptional cases today, and routine tomorrow. The people who run the American security apparatus are in the overwhelming majority diligent people with a deep concern for civil liberties. But their job is to find creative ways to collect information. And they work within an institution that, because of its secrecy, is fundamentally inimical to democracy and to a free society.

I can't believe that David Simon, of all people, doesn't see the danger inherent in a permanent domestic surveillance program. I doubt that he would support a government initiative for all Americans to wear tracking devices in the name of fighting terrorism. Yet the NSA data collection program, whose output is functionally identical, seems not to trip the same alarm bells with him.

-:-

In public statements, the NSA director has defended domestic surveillance as a vital tool in preventing terrorism.

The term 'terrorism' is a magic word, unlocking government powers we normally associate with wartime. The current and previous Administration have, at various times, asserted the right of the government to conduct invasive and open-ended surveillance on people it suspects of terrorism, detain suspects in terrorism cases indefinitely without trial, 'render' them to countries for interrogation and torture, kill people it considers terrorists, including American citizens, with giant flying robots, or keep such people alive against their own will.

This is total power over human life. The authorities assure us that numerous checks exist to prevent abuses of this power, but of course the checks are also classified. The government is promising that the secret police won't put innocent people in the secret prisons because the secret courts would never allow it.

This system puts enormous pressure on a small group of fallible human beings. For the secrecy to work, the number of people in on the secret must be small ③. But this group is all part of the same hierarchy, subject to the same pressures, and unable to communicate its concerns outside the same closed circle.

Talk of secret prisons, indefinite detention, and force-feeding can sound tendentious (though it's all uncontested public record!). Americans have a deep faith in the rule of law and have not proven receptive to the argument that truly innocent people will find themselves placed in the "terrorist" category by accident.

There is a tendency among those who grew up under the rule of law to treat it like the Rock of Ages, an immovable substrate in which all the institutions of the state are forever anchored. And so even ordinarily skeptical people tend to assume that the government obeys its own laws when no one is looking. To an astonishing extent, and to the great credit of American civic life, this is actually true.

But I think a better metaphor for the rule of law is that it is the soil in which democratic institutions take root. Like the soil, it can be depleted. And once depleted, it is not easily replenished.

Secrecy erodes the rule of law because it makes democratic accountability impossible. Secrets can't be held too broadly, so secrecy concentrates responsibility and asks too much of human nature. That is why every intelligence agency, unless given rigorous outside oversight, commits terrible excesses.

I think Simon agrees about the perniciousness of this secrecy. In a later rebuttal he's called for a modern-day version of the Church Committee, a group of people from outside the security establishment with top-secret access and the power to compel testimony.

And I agree with Simon that the current state of affairs is the "inevitable consequence of legislation that we drafted and passed."

American politics since the Cold War has operated under the conceit that national security must transcend partisan differences. And so we have seen large bipartisan majorities voting for pre-emptive war and domestic surveillance even though both of those policies were highly controversial outside Congress.

This tradition has created a vast space beyond political accountability. When both political parties pursue a nearly identical policy, there are no electoral consequences when the policy proves disastrously wrong. Who do you vote against?

People have good intuitions about the danger of indiscriminate collection and retention of their data. They're not being hysterical. For the last decade, we've been concentrating on how to regulate the way this data gets used in the private sector. But now that the coercive power of the state has entered the picture, the stakes are much higher, and we have an opportunity to politicize the debate. David Simon tells us to resign ourselves to the consequences of technological change:

"The question is not should the resulting data exist. It does. And it forever will, to a greater and greater extent."

But I think that is wrong. Whether the data should exist, and for how long, is exactly the question. The answer is not a technological inevitability, but a political choice.

I believe a world in which everything is recorded and persists forever carries the seeds of something monstrous ④. It is in the nature of computer systems to remember things indefinitely, but there's nothing difficult about programming machines to forget. It just requires laws to do it. We can't treat it as a technical problem. And to get the laws passed, we need to politicize the issue.

Still, these barricades are going to seem awfully lonely if we can't even get David Simon up there with us. The man should be a natural ally, and the fact that he sounds so exasperated troubles me. The fact that he seems resigned to a future of total information retention troubles me. The fact that we are talking past each other troubles me most of all.

① Simon also mentions the FBI, but it's unclear to me that this agency has anything to do with the accusations of widespread call monitoring.

② The expensive part is keeping everything secret, and staffing it with people cleared for such access. The database itself is likely quite modest in size.

③ The Washington Post has estimated the number of people with Top Secret clearance at 854,000. The number of people with full knowledge of all secret programs is much smaller, as this information is carefully compartmentalized.

④ Except Pinboard archives. Those are great!

—maciej on June 15, 2013

Berlin Meetup Aftermath

June 09, 2013

Our meetup in Berlin yesterday proved to be the best-attended one yet, with fifteen Pinboard users and one extraordinarily patient child fighting their introvert instincts on a beautiful spring day.

We had a diverse group of people from across the US and Europe, ranging from designers and web developers, to game developers and even a mobile app developer! Luckily everyone found a common language.

In order not to drink beer on an empty stomach, I stopped in a Bavarian restaurant before the meetup and ordered the item below, about which I would like to make the following observations:

Wrapping this thing in bacon would actually make it healthier.
It appears to be made from the heart of the last person to order it.
Clicking 'enhance' on this photo crashes iPhoto and turns its icon into a ham hock.
It's hard to see, but the parsley is hovering a few millimeters above the meat field.
If I ever visit Bavaria, back up your bookmarks.

I sincerely thank everyone for coming and hope to see you in Berlin again soon!

—maciej on June 09, 2013

« earlier

later »

Pinboard is a bookmarking site and personal archive with an emphasis on speed over socializing.

This is the Pinboard developer blog, where I announce features and share news.

How To Reach Help

Send bug reports to bugs@pinboard.in

Talk to me on Twitter

Post to the discussion group at pinboard-dev

Or find me on IRC: #pinboard at freenode.net