You may or may not have noticed that yesterday, www.dreamhost.com was offline and unreachable for the better part of 6 hours. We can’t let something like that go without an explanation.
I should note that during this time no customer sites were affected (other than one – which I’ll get to) and the main “www.dreamhost.com” domain. Customer sites were up, our web panel was up, everything was up…including the ire of some tech-savvy Muslims!
We’ve got a fairly liberal free-speech policy here which we’re quite proud of. Speech that is protected by the United States Constitution’s First Amendment is protected by DreamHost. While we don’t always agree with the content of the sites we host, we do support their right to host it in America!
Yesterday was Draw Mohammad Day.
This did not sit well with roughly 21% of the world’s population.
We happened to be hosting drawmuhammadday.com, a site that encouraged people to draw images of Mohammed. That’s kind of a no-no in the Muslim world.
Incidentally, did you know there’s like a million different ways to spell Mohammed?
In the spirit of yesterday’s event, but without the offensive parts, I’ve drawn some pictures to show you what you might have missed!
Some people weren’t too keen on the idea of the Draw Mohammad Day website and suddenly we were the target of the largest Distributed Denial of Service attack (DDoS) we’ve ever seen. drawmuhammadday.com was the first to fall. It was the main target and it didn’t take long…based on our stats it looked like almost the entire country of Pakistan was attacking us! Well not really. But nobody in Pakistan could reach YouTube, Facebook, or Twitter yesterday, so what else were they gonna do?
These weren’t just random attacks from here and there. We saw several Pakistani groups targeting us on their blogs, often providing step-by-step directions and automated tools for launching e-assaults on dreamhost.com and drawmuhammadday.com.
They did not let up once the site was down. At one point dreamhost.com (the site itself) was handling around 20,000 requests per second. To put that number in perspective, when our customers’ sites have traffic surges a busy day might see that number get up to ten or even twenty.
Our load balancers, as great as they are, typically handle about 4,000 connections at any given moment. During the attack they made it up to 400,000 before they seized up and crapped out. We believe that even the most top-shelf battle-hardened load balancing options would not have been able to withstand an attack of this scale – a quick jump in traffic about 100x larger than normal traffic patterns we see on any given day.
Our fault-tolerant setup relied on those load balancers and they proved to be our undoing. Luckily only some services were affected by this for a very short time (webmail being one of them) before we got them going again a few minutes later.
To restore services we had to take the site down altogether while we moved it to newer, stronger hardware, beyond the reach of our load balancers. We tuned the Linux kernel on this new machine aggressively to use less memory for TCP connections. We also abandoned Apache, favoring a specialized nginx installation.
When we flipped the switch to get dreamhost.com up and running again at around 2PM PDT, the attack load had dropped to 130,000 simultaneous connections with over 20,000 requests per second. The new setup took it like a champ and continues to perform well today – even while we’re still seeing elevated traffic as a result of lingering attacks.
We’re proud to say (and repeat!) that customer sites were not affected and our control panel was still reachable during this entire debacle. And of course if you ever suspect server problems with your DreamHost account be sure to check dreamhoststatus.com!
We learned some lessons yesterday and, moving forward, we’re going to put them into practice. Thanks for hangin’ in there.