A YouTube video about YouTube itself revels how that mega-site delivers mega-results.
There's a great deal of talk, these days, about scaling applications. Whether it's running big-data up in the cloud or dishing real-time updates out to zillions of smartphones, being able to harness computing power in virtually any amount that's required, as it's needed, is emerging as a must for just about every IT-intensive enterprise.
Just ask YouTube, the video-distribution unit of Google, which is one of the most IT-intensive Websites in the galaxy. YouTube is a textbook case of scalable IT, ready to blast out video streams by the millions, seemingly glitch-free.
If you've ever wondered how this is possible, just grab some popcorn and point your browser to the site itself, for YouTube is showing a video specifically about how it does what it does. The 38-minute piece features Mike Solomon, one of the company's original engineers, talking early this year at a conference called PyCon, "the largest annual gathering for the community using and developing the open-source Python programming language."
What you'll learn first from Solomon are some impressive statistics. For instance, YouTube now serves some 4 billion videos every day -- almost one for every person on the planet. And every minute of every day, what's more, the company's servers digest 60 hours of additional video footage. And yet, even though the number of videos YouTube hosts has grown by nine orders of magnitude since the company launched in 2005, the number of developers working there has grown by only two orders of magnitude.
As you can probably guess, the bulk of YouTube's code -- measuring around 1 million lines, in total -- is written in Python, a general-purpose language known for its flexibility and extensibility. Likewise, the company relies heavily on open-source programs such as the Linux operating system, Apache Web Server, and MySQL database manager.
In fact, Google has developed and open-sourced a front-end for MySQL that helps that DBMS to scale well in Web-based applications. Called Vitess, this software is designed to help a small number of database nodes to handle on the order of 10,000 connections at once and respond to tens of thousands of queries per second.
Faking it
YouTuber Solomon describes many of the approaches and tricks his team uses to achieve YouTube's amazing scale. This starts with a general philosophy of keeping things simple, which also helps keep things flexible.
"The minute you over-specify something, you paint yourself into a corner," comments Todd Hoff, who blogs at High Scalability and who brought this video to my attention. "You aren't going to make those guarantees. Your problem becomes automatically more complex when you try and make all those guarantees. You leave yourself no way out."
One surprising revelation: YouTube isn't above faking data for the sake of keeping viewers happy without excessive development efforts -- an "awesome technique," enthuses Hoff. "The fastest function call is the one that doesn't happen. When you have a monotonically increasing counter, like movie view counts or profile view counts, you could do a transaction [after] every update. Or, you could do a transaction every once in awhile and update by a random amount, and as long as it changes from odd to even, people would probably believe it's real."
This, I have to say, is dismaying. I have never believed absolutely everything YouTube ever showed me, but now I really have to wonder, did 3,459 people really watch that video of my singing cat I posted last year? I was so sure I had a minor hit on my hands, but now, I guess I'll never know.
How does your IT shop keep its apps scaling through thick and thin?