Thursday, March 15, 2007

People have asked me how StreamCruncher performs. Until today I did not have a clear answer because I did not have access to a good Server/PC. Last week I managed to borrow (just for a few minutes) a high end Intel/Window2K Server - 4 CPU, 3.6 GHz each with a total of 3 GB RAM.

Out of curiosity, I ran the streamcruncher.test.func.h2.H2TimeWindowFPerfTest test with all the default configurations to JVM, StreamCruncher 1.08 Beta, H2 Database etc, Parallel GC but with just 1 Collector Thread etc.

The test pumps 250 events in one burst, pauses for 6 seconds and repeats this for 4 cycles. So, it does 4 iterations of 250 events each. The test is a simple Straight Through Processing. A simple Time Based Anonymous Partition, which holds Events for 5 seconds. The Test measures the average time it takes for an Event to get expelled from the Kernel (publish by Kernel) since the time it was created. This does not include the 5 seconds it stays in the Window. It measures the Overhead added by the Kernel processing.

Keep in mind that StreamCruncher is still in Beta. There are still a few rough edges to be taken care of - StreamCruncher has neither been Profiled nor have load & performance tests ever been run.

On my 1.6GHz Win XP Laptop with 1 Gig RAM and JDK 1.6, the average Latency per Event for this test was a disappointing 380 Milliseconds. I was quite sure that the single CPU was bogging down the performance, because the Test creates and inserts live, randomly generated Data and the Kernel which is heavily multi-threaded also shares the same CPU. Performance was below expectations. Since all these operations were in memory, there was practically no opportunity for the CPU to switch between Threads while another Thread was blocking on the Network or Disk. So, there was practically no concurrency.

But when I ran the same test on the 4 CPU Box, the average CPU utilization did not even change noticeably while spitting out Events at a fantastic 7-9 Millisecond latency. I was blown away by the numbers. Preliminary though it may be, the multi-threading change in Release 1.05 must've really paid off. Another important thing was that the Tests were printing verbose output to the Console, which slows down the whole application. So, with Zero logging or logging to a file, latency might've improved even further. However, there was an ugly bug that reared its head when I re-directed Output to the File. I kept getting a "Unique Constraint Violated" error quite often. This has to be fixed soon. Since H2 Database was used in these tests, which in its current version does not support Multi-threaded access I'm hoping performance of StreamCruncher will be exceptional on other In-Memory Databases like Oracle TimesTen.

Sometime over the next few months I'll conduct proper tests on a stabler version of StreamCruncher and publish my findings.


Unknown said...

Hello Ashwin

Very interesting stuff you're posting here. And quite amazing the difference between 1 cpu and 4 cpu, well but... I'm still sitting here and wondering ;-)

What were you exactly testing for, besides performance? I would imagine that there could be a whole bunch of different queries!

Ashwin Jayaprakash said...

Thanks! I just wanted to see if the multi-threaded kernel could scale up. And it looked like it could.