Monday, May 11, 2009

All your BASE are belong to the Cloud

If you haven't heard about the CAP vs BASE debates yet, then you are definitely not following the "cloud" or distributed systems space very closely.

Before you wander off following these links, let me tell you briefly what CAP and BASE stand for:
CAP = Consistency, Availability, Partition-tolerance
BASE = Basically Available, Soft state, Eventual consistency

Apparently, the CAP theorem says that you can only have any 2 of the 3 at any point when you are talking about really large scale, horizontally scaled software. And this article explains quite clearly why that is so.

Dr. Eric Brewer of Inktomi (and Berkley) fame had described an alternative way back in 1998 called BASE.

He said that your software has to be aware of its distributed nature. Instead of relying on the Database or some other centrally (single point of failure) shared resource like a File system to provide ACID guarantees, build your software such that it can accommodate failures of increasing severity. Only then can you really scale out.

There is plenty of material on the Web to do this research. I am aware of people using Distributed Caches and its variants on massive scales such as - Memcached, Hadoop, Cassandra, Voldermort etc. On the other hand I have personally seen people using Distributed Caches running alongside traditional OLTP systems - almost the other end of the spectrum where you have modest 20-30 nodes.

However, I was wondering what people/projects in the middle of this cluster-scale are doing. I found 2 interesting presentations:
1) Financial Transaction Exchange at
2) Forging ahead - Scaling the BBC into Web/2.0

The first BetFair presentation doesn't really talk about clusters, per se. But they do talk about their experience with various scalability models they attempted. Their architecture seems to be smack dab in the middle of this scale. They still use a single large Database for most of their work.

The second one is moderately interesting because they are very much like these other massive scale Web 2.0 systems. I have great respect for such systems, but I've already read about such things like Amazon, Flickr, Facebook and the like. But one really interesting thing they have done is use markers (like Airport Threat Level colors :-) to identify the health of the servers in the system. Based on this they decide what to do with the user requests. This is perhaps the clearest application of BASE that I've read about so far. Amazon's Dynamo also does this, I believe - but they use Vector clocks and other algorithms.

Anyway.. interesting reading.