One of the Web's biggest sites reveals why it built its own cloud yet still relies on shared IT services.
Common wisdom would have it that public providers of cloud-based computing are in a league of their own, able to run IT infrastructure more efficiently than most companies can do for themselves. After all, the story goes, these cloud outfits specialize in IT -- they live and breathe technology day in, day out -- so how could any non-specialist do it better than they can?
In fact, this isn't necessarily true, as explained in an article appearing just now at Ars Technica. It describes the history and evolution of IT infrastructure at a social network gaming company called Zynga, which you may know from playing such popular titles as FarmVille and ZyngaPoker. Over the course of its five years in existence, Zynga has gone from running its own servers to renting a whole mess of them from Amazon's EC2 unit to running its own again.
As Ars Technica explains things, Zynga launched in 2007, running its own servers in the space it rented at a co-location facility. All was well. But in 2009, the company had a giant hit on its hands with FarmVille. The number of players rocketed to 10 million in six weeks, and then to 25 million in five months. "We couldn't get power fast enough. We couldn't get servers fast enough," Allan Leinwand, Zynga's CTO of infrastructure, is quoted as saying.
Thankfully, Amazon was able to provide the capacity and scalability that Zynga needed to thrive during what turned out to be a massive ramp-up in usage. But then, the gaming company figured it could run its own infrastructure in a way that was more suitable to its business than Amazon's service.
Zynga built a datacenter on each coast of the US, and by early 2011, about 20 percent of its users were playing games running there, not in the Amazon cloud. And by late 2011, it was 80 percent Zynga servers and 20 percent Amazon. In just two years, the company grew its server count by a factor of 50, yet it still retains Amazon as a "shock absorber" to help cope with sudden spikes in demand.
Zynga describes its "zCloud" as being managed with CloudStack software, an orchestration package recently released to the open-source community by Citrix Systems. Helping to design the setup was a company called RightScale. Hypervisor? A customized version of Xen.
Interestingly, Zynga's Leinwand wouldn't tell Ars Technica what kind of servers the company is using, or even how many datacenter facilities it has built for itself. He says the company is not building its own servers but procures them from suppliers that are able to do some amount of customization to help match hardware to different games' operating characteristics.
With "dozens of tweaks" and analysis of server and network performance, Zynga is able to run on a single server workloads that require three servers at Amazon.
Leinwand tells Ars Technica: "We dug into our applications. We wrote tools that got into the memory heap. We wrote tools that look at profiling certain processes. We got into the Linux kernel. We use CentOS, and we got into the CentOS kernel and figured out where those bottlenecks are."
Another intriguing factoid: Zynga's core database clocks in at 1.4 petabytes, organized into 24.5 trillion rows of data. That may sound as though the company is recording every click of every mouse that ever gets near one of its online games, a real exercise in big-data. But it's not. Last year, on its own engineering blog, Zynga discussed some of its thinking in this area, noting that it was recording "only" 5TB or so of data every day, not the the 50TB its gamers generate.
Clearly, not every enterprise is in the gaming business and serving and tracking hundreds of millions of active users. But it's interesting, if not necessarily applicable, to read about how some of the Web's bigger operators actually operate.