When customers buy any of DreamHost’s managed services, from shared hosting to VPS solutions to dedicated servers, they’re buying the reassurance that we’ll take care of all the behind-the-scenes stuff so they can focus on what really matters: their website.
Managing a fleet of servers across four data centers to support more than 1.5 million websites — with 99.98% uptime — is no small task.
So how does DreamHost make sure all those sites stay humming when there’s a software package to upgrade nearly every week? I sat down with DreamHost’s Director of Technology, Nate Michael, to learn how his team keeps your site safe.
The TechOps team oversees operations for all of DreamHost’s hosting products. How do you manage such variety?
DreamHost started working very early on to automate all the operations in the data centers and that allows us to tame the complexity. We use custom-built tools, alongside standard ones like Chef, to manage the clusters in a predictable and replicable way.
The shared hosting clusters, for example, are all built using the same recipes and operating system images. A cluster is made of different types of server hardware, depending on its role: some hardware is optimized for storage, some for computing power, and so on. The important part is that each component is quite homogenous.
For the software part, we use a customized version of Ubuntu. When new hardware is added to the cluster, we install this custom Ubuntu image and automatically let Chef and our tools integrate the machines in the cluster. It’s crucial for my team to reduce the complexity at every step, starting from the hardware. We tend to buy similar hardware and stick to mature software versions for as long as possible. That’s what allows us to offer more services to our customers without charging premium rates.
We know reliability is important. That’s why we offer a 100% uptime guarantee.
Your team is always stuck between adding new features and maintaining stability. How do you cope with things like upgrades, starting from the kernel to the application layers?
The way we build our custom Ubuntu installation allows us to take full control of the stack that powers our customer’s websites. Our Technical Operations team runs a custom build of the Ubuntu Linux kernel that includes a licensed technology from Ksplice, the technology that allows us to patch the kernel without rebooting machines. This is crucial when you run a fleet of servers that power disparate services.
On top of the kernel, we also maintain the basic components needed for running websites: Apache and nginx, plus various versions of PHP, Python, Perl, and more. This allows us to be independent from the underlying Ubuntu version and support more variety to serve our customers’ needs.
It seems like what you call Ubuntu is not really the regular Ubuntu distribution. How important is it to keep up with latest version of the base operating system?
We could as well call our base operating system “Dreambuntu” given how much we customize it. There are more than 20 people keeping tabs on everything that is needed to keep systems running safe and steadily.
To give an example, when recent vulnerabilities were announced for basic pieces like the Linux kernel (Dirty COW), OpenSSL (Heartbleed), and many others, we’ve built packages quickly and backported fixes to all the packages that our customers run. That includes versions of the software that may not be supported upstream but are supported by DreamHost. It’s a tough job, but in the overall scheme of things it’s a job that pays off for us. Most of our customers care that DreamHost offers systems that are stable and can run their websites reliably. The less they have to worry about kernel or Apache versions, the better.
Reassuring to hear that there is a team working to maintain the packages. Are these people all in the TechOps team?
Besides the TechOps team, DreamHost also has a team dedicated to security: the Nightmare Labs team. Among other things, they actively monitor the white/grey hat hacker community, keep tabs on common vulnerabilities and exposures, and respond to threats when they arise. We work together, in collaboration with our vendors, to keep our customers safe.
Experiencing too much downtime with another web host? Let us help.