On Friday 10th December, YDN was at Scalecamp London 2010, held at the Guardian offices, and it was a blast. Scalability and Performance are art-forms that are close to our heart at Yahoo!, so we felt both honored and obligated to sponsor the event. There was a good turnout for a cold December day, with approximately 120 campers. The event was organized by Michael Brunton-Spall from the Guardian. Big thanks to Michael for putting on the event for us!
The first talk that I attended was hosted by Gareth Rushgrove, and covered the often overlooked subject of scaling teams. The debate focused on three key areas: the human factor, development process, and technical architecture.
Most folks agreed that focusing too much on technical skills vs. team fit when hiring is a common factor in failing to scale teams effectively. Also, once you have people onboard, you have to look out for sources of discontent. Some people are either miserable by nature, or the working environment just doesn't work for them. When you see unhappy and miserable folks, try to solve their problems first, but if you just can't get them onboard and enthusiastic about working at your company, then you are better off without them, no matter how good they are technically. Misery is like a disease that will spread to the rest of your team fast. Ignore it at your peril.
One attendee noted that he once worked for a company where the CTO regularly told his staff that, if they are not enthusiastic and exited about working for the company, then he'd give them £500 to leave at any time. I'm not sure what HR would think, but i understand the sentiment. Happy and enthusiastic teams can be incredibly productive, and any influences that work against this should be eradicated like a pest.
When it came to development process, most agreed that some form of agile process such as Scrum, Kanban, or Scrumban was the most effective way to deal with rapidly growing teams. However, whatever your process, buddying new folks with existing team members was key to successful onboarding. But you must keep in mind that adding new people to existing teams slows down your velocity for a while before you will see any upside. No process I've come across can subvert Brooke's Law.
From a technical architecture perspective, most agreed that avoiding the Big Ball of Mud architecture was key to having many people working on a system at once. Break up your architecture into loosely coupled components with clean interfaces, and you'll have a better chance of success. Also, make sure that you have good unit-test coverage (otherwise new people will lack the confidence to dive in), and you'll avoid the otherwise overwhelming terror on release day.
As you would expect at a scalability event, there were many sessions from folks using Hadoop to do some really interesting stuff. The one that stood out for me was run by Matt Biddulph from Nokia. He talked about how Nokia has masses of data (some of it real-time) from people using phone-based navigation systems in cities around the world, and how he uses Hadoop to analyze patterns in that data. He describes himself as Nokia's "Data detective", where he roams around collecting other teams logs or "droppings", and analyzes them for "interestingness." I think every company should have one of these, and i bet that they'd have no shortage of volunteers too. Cool Job!
Matt showed us a number of info/heat-maps that were based on Hadoop-generated data. These showed all sorts of things from traffic patterns at different times of day, to the distribution of Starbucks and Bars in cities around the world. He even had data on the incidence with which people hovered over certain map tiles. It was clear that there's lots of potential for this type of data.
Nokia have more smartphone handsets out in the world than any other provider. If you look at this in the context of their acquisition of Navteq, their comprehensive Place Registry of places in the world (hotels, bars, etc), and the demographic information that they have about the smartphone users, then it's clear to see that Nokia is in great position to be using Hadoop to identify interesting patterns in global geo-data to fuel innovation in the development of new smartphone applications. I can't wait to see the fruits of this effort!
Good Old Filesystems
Yahoo! Developer Darren Foreman attended a session run by Richard Jones from IRC Cloud in which we were all expecting a good old debate about the relative merits of the various NoSQL databases. This is what he had to say about the session (scalecamp2010):
NoSQL is a catch-all term used to describe databases such as Cassandra and MongoDB and, as expected at a scalability conference, talk of these permeated many of the sessions. These technologies differ from databases such as MySQL in that they offer a performant and scaleable way to host and serve data for which a full-blown relational model is not required — typically the classic "key/value pair" structure. However, a somewhat more surprising theme also came through: that of using raw files on disk as a data storage mechanism.
In particular this was championed in the session hosted by Richard Jones from IRC Cloud. Using a real world problem currently facing the development of his site's architecture, he kicked off a discussion about ways to store and retrieve large — potentially exponentially growing — amounts of data. The participants generally agreed if the characteristics of your data allow you to use a hierarchy of directories and files for storage and retrieval, then do so. There's life in the 'old technology' of filesystems yet!
I didn't attend this session by James Aylett, but after reading about it afterwards, I wished I had. He ran a session following up on his Scalecamp 2009 session where he got people to "act out" a TCP exchange. The purpose of this was to show the value of dramatization in the education of people about technology.
In this 2010 session, he spent the time coming up with a list of networking things that could be dramatized in this way, such as TCP handshake, ARP, and so on. This sounded like fun and could be useful to universities and schools as teaching material. He started a Google Group for people who want to get involved.
Scalability Test Suite
The final session that I attended was a discussion on what a scalability test suite would look like. As you would expect we started discussing the value of JMeter, and quickly ran into the fact that requesting the same page over and over again isn't the most representative measure of how an infrastructure can handle real traffic at high load. If you use caching in your application, you're really just testing the cache after the first request, and we already know how awesome Varnish is at serving cached items.
Michael Brunton-Spall then talked a bit about how the Guardian do this. They collect and re-run real access logs, say from a busy Friday, on a scaled down version of their infrastructure to measure for capacity and HTTP response patterns. A number of others in the room noted that they'd had some success with the Tsung open-source multi-protocol distributed load testing tool.
scalecamp-jsguyAll in all, Scalecamp was a very enjoyable day with some very smart people. I love the "camp" format, as it gives opportunities for new talent in the industry to be recognized and affords a conversational and inclusive style of session that conventional conferences don't. I'm looking forward to Scalecamp 2011, and hope that Yahoo! will continue to play a part in it.
Editor's Note We would be remiss not to credit economist E. F. Schumacher's Small Is Beautiful: Economics As If People Mattered for our play on his book title.