Sunday, November 21, 2010

Scalable compute & storage frameworks - A Refcard (in progress)

If you have been closely following the NoSql space or even shown a mild interest in scalable technologies such as Compute Grids, Data Grids, Distributed Caches or the countless other terms that people use interchangeably - you have probably realized that most Architects do not have the time or the resources to investigate the sift through the noise and decide on what to use.

Since I've had some experience using one such framework and also because I follow the progress of some others, I thought it would be helpful to everyone if I put together some information.

Please share and contribute information. Spread the word. Your efforts will be acknowledged. Ask for permission to work on the Wiki and Spreadsheet.
What I have done is created a Google Code project: scalable-frameworks where I hope I can spare the time to keep it updated and enlist some help from the community at large to gather correct information.
  • The intention is for it to serve as a ready reckoner and not be complete or authoritative
  • Performance is a criterion that has consciously been excluded from the lists here to avoid flame wars
  • For the full information it would be best to visit the actual product's/project's website
  • It is not official, nor has it been prepared by thorough research
  • If you have questions or would like to clarify/contribute, please get in touch
To start with, there are 2 parts:
  1. A very simple introduction with images describing the basic concepts
  2. A spreadsheet that is meant to serve as a ready reckoner - to help you choose the right framework/platform
    • It has some basic features listed
    • Pay attention to the features that you would find most useful and pick the project that has most/all the ones you are looking for

Basic concepts:
To help understand the basic processing and storage idioms being explored, here are a few images:

"Store and retrieve" (Scatter-Gather) on a cluster of compute + storage nodes:

"Store, notify changes, apply changes" (Scatter-Relay-Compute) on a cluster of compute + storage nodes:

"Store, notify changes, calculate, notify new calculation result" (Scatter-Relay-ComputeAlert) on a cluster of compute + storage nodes: 

(Full Spreadsheet)

Until next time!