Monday, March 26, 2007

Here's another reason why using an RDBMS in StreamCruncher was not a bad design decision at all. RDBMSes (what's the plural of RDBMS?) have attained a status (in Middleware) that is unequalled by any other technology. We trust RDBMS to store our Bank balances, a nation's entire Census data, Criminal records and the list goes on and on. So, when such a technology is always within easy reach, common sense tells us that it should not just be used, but embraced...Ok, enough of that and moving to the point ...

Many people who are familiar with ESP and CEP, when they learn that StreamCruncher uses an RDBMS underneath, probably wince at the mere mention of a Database. To them Databases are good but not good enough for Stream Processing. To many RDBMS is the anti-thesis of Stream Processing. Why? They begin to express vehemently about the "presumed" drawbacks of such an architecture. They start lecturing about Performance. Speed. "Sub-millisecond latency cannot not be achieved in regular RDBMS".. and so on.

Well, they're not entirely true. Even if you ignore the fact that StreamCruncher already supports several Embedded, In-Memory, Real-time Databases that are routinely used in Telecom (the DBs), there is a classic solution for regular RDBMS that will make them as good as any In-Memory Database. The biggest hurdle is the Hard disk - Persistence. Databases are meant to store data for posterity. But storing data to the Disk also adds a lot of latency. All other things offered by the Database like concurrency, scalability etc are very much required for Event Processing. Persistence is not required for Event Stream Processing. So, a regular Enterprise class Database can still be used as the base for StreamCruncher, which provides a solid/robust foundation on which to perform ESP/CEP - all this by creating the Database Tablespace on what is called a RAMDisk. Which is just a soft-drive, where a portion of the Physical Memory/RAM is turned into a Storage Drive (like C:\ or /usr/home/myname). This acts as any other Drive where files can be created etc but everything gets wiped out when the Machine is rebooted. This technique is not something new, but fits perfectly well in the context of StreamCruncher. RAMDrives can be created for almost any Operating System. A simple search on the Internet reveals several products and techniques for creating RAM Drives.

So, all the licensing costs that a Company has incurred on acquiring and maintaining these "Big Daddy" Databases can still be leveraged for Event Stream Processing. In the end, StreamCruncher gets to use a time tested Database that Developers have been using in their other regular Projects and can rely on the stability of such DBs. The important Sorting, Pre-Filtering, Joining and other CPU-intensive operations happen in the Database Engine, which is in Native code. And of course the Developers' familiarity with such Databases also plays a crucial role in adoption and integration with other parts of the project.

0 comments: