I was digging up links and articles to my old StreamCruncher hobby project. Cleaning up my bookmarks, rather.Here's what Google came up with (after some manual filtering):
- Paul Dekker's Master's thesis on Complex Event Processing with StreamCruncher, RuleCore and Esper as case studies
- Edson Tirelli's blog entry, one of the Drools guys
- An Introduction To Data Stream Query Processing from Truviso
- Creating an event driven SOA
- Marco on in-memory DBs for CEP
- The SQL debate on Financial Techinsider.
- Zepheira presentation
Later.
Thursday, March 18, 2010
Digging up links to my old Event Stream Processor
By Ashwin Jayaprakash 0 comments
Topics: #event processing, #java, #note to self, #streamcruncher, tech
Friday, September 04, 2009
Windowing features in PostgreSQL 8.4
PostgreSQL recently added a whole bunch of mini-analytic features in version 8.4. It also happens to be Turing complete with support for recursion. Ha! Recursion in SQL...
To me, the most interesting feature was the introduction of the partition by
and related clauses. I remember using this feature in Oracle 10g in 2005, which I think was part of the Analytics package if I'm not mistaken. It's important to me because this is what inspired me in some way to start working on StreamCruncher and explore other Windowing concepts that are now standard in any Event Stream Processing product.SELECT key, SUM(val) OVER (PARTITION BY key) FROM tbl;
Wednesday, July 22, 2009
To parse or not to parse
I remember spending a lot of time working on SQL extensions and ANTLR grammar not too long ago. This was during my StreamCruncher days.
Looking back, the effort it required to write a clear, unambiguous grammar, simplify the AST, validate it and then finally produce some executable code was considerable. I distinctly recall that it was not very enjoyable.
Nowadays, the Fluent interface is gaining popularity, especially for implementing mini-DSLs. Here's a introduction to designing DSLs in Java.
For clarity, I'll refer to the custom language/grammar as Stringified-DSL and the Fluent interface as Fluent-DSL.
Now, don't get me wrong I'm all for designing a concise and crisp "language" for specific tasks. That's the whole reason I built my program on SQL extensions (i.e Stringified). But does it always have to be a dumb string - something in quotes, which cannot be refactored, no IDE support like auto-complete, the hassle of providing a custom editor for it...?
It doesn't just end there. When you expose a custom Stringified-DSL you and your users are stuck with it for a very long time. If you want to expose any data structures which your users will eventually ask for - after playing with the simple stringified demos, then you will have to ship the new grammar to them, enhance your custom editor, document them clearly. Also, if the underlying language - let's consider Java suddenly decides to unleash a whole bunch of new awesome APIs, syntax and other language enhancements; which is exactly what happened with Java 5 if you recall - Annotations, Concurrency libraries, Generics. You see what your users would be missing because you chose the Stringified route?
In case you are wondering why your users would want to use Concurrency, data structures and such stuff - we are talking about Technical users and not Business users. Business users will be happy to play with Excel or a sexy UI, so we are not talking about their requirements.
So, what are the benefits of exposing a Fluent-DSL?
- Easy to introduce even in a minor version of the product. Quick turn around because it involves mostly APIs
- Documentation is easy because it can go as part of the JavaDocs
- Easy for the users to consume and use. Lower setup costs and only a mild learning curve
- IDE features - Refactoring, Type safety, no custom editor required
- Base language features - Users don't have to shoehorn their code into a clumsy custom editor, code can be unconstrained and they are free to use data structures, Annotations, Generics, Concurrency, Fork-Join and all the other programming language goodies. Which is very good in the long run because you don't have to paint yourself into a corner with all that custom stuff
- LambdaJ
- JaQu
- Those damned Closures that never made it
- FunctionalJava - to an extent
- MVEL - my favorite. The Interceptor feature is a nice touch
- JSF Expression Languages like - JUEL and SpEL
- MPS from the IntelliJ guys. I haven't understood this fully. It seems to be somewhat like MS Oslo, maybe less. But Oslo itself is hard to understand because of the poor arrangement of their docs
- DSLs in Groovy - although I'm not very convinced since Groovy itself has a learning curve
Cheers!
Monday, August 04, 2008
StreamCruncher source on Google Code
I uploaded StreamCruncher version 2.3 source code to Google Code a few weeks ago. If anyone is interested in maintaining it, you are most welcome.
Wednesday, October 10, 2007
Expression Languages (OGNL and MVEL)
For those of you using Expression Languages in your programs to add that bit of pluggable logic fragments, I'm sure you've evaluated or probably even use OGNL. StreamCruncher uses OGNL 2.7+ to handle some of the messier parts of Expression evaluation and I've found it to be a huge time saver. What's even better is that the 2.7 version also converts the Expressions into dynamically generated Bytecode.
So, if you want to evaluate OGNL, here is a list of useful links. They are not easily locatable, so I thought this would also be a good place for me to bookmark them for future use.
Here's the old link - OGNL
The 2.7+ versions are handled by this guy Jesse - his Blog
The latest releases - on OpenSymphony
I also strongly suggest evaluating another very well done Expression Language - MVEL,which is written by Mike - his Blog
Saturday, October 06, 2007
StreamCruncher 2.2 Release Candidate
The 2.2 Release Candidate is now available. This has some important performance related changes over the 2.2 Beta version. Over the past few releases, I've spent a considerable amount of time working on parts of the Kernel to perform end-to-end processing without having to go the Database. This version performs Correlation Query processing and single Stream Query processing entirely in Memory. As a result, there are some things that don't work - like the "Order by" and "Group by" clauses. You might call it laziness, but there's only so much time I can spend on this, what with a day job and all.
Anyway, the performance has shot up to very respectable figures. The CorrelationPerfTest that I spoke about in my previous blog can now process a total of 168,000 Events per second on a single Processor, dual Core 1.8 GHz Centrino with 2 GB Memory. The Test has 3 Correlation Queries. Two of them correlate 3 Streams each and one Query correlates 2 Streams.
It's been a long journey. I'm so glad that SC can do this many Events per second now. I remember being quite worried a year and a half ago, when it could not do more than a few hundred events per second.
Monday, August 20, 2007
StreamCruncher 2.2 Beta
Over the past few weeks I've been meddling with the Correlation engine inside StreamCruncher. The 2.2 Release is a result of that ongoing work. "Ongoing" - thus the Beta status. Nevertheless, this release includes changes to the Correlation code (alert..using..when.. clause), with rather drastic changes, in that the Kernel no longer requires the use of the Database to perform Pattern matching! Thereby, increasing performance and decreasing latency several fold.
A new TestCase has been added - CorrelationPerfTest, that demonstrates this wonderful improvement. In this TestCase, there are 4 Streams of Events that are monitored by 3 different Queries, each looking for a distinct Pattern. Since each Query runs in its own group of Threads, the Test scales well on Multi-Core/CPU systems. This Test also generates and consumes large amounts of data and thus serves as a Stress Test too.
Since this is an interim release, there are a few features that are still being developed - like the "case...when.." clause that used to work in previous releases. Now that the Database is no longer being used for Pattern matching, those features that were freely available in the Database are yet to be replicated by the Kernel.
Wednesday, July 18, 2007
StreamCruncher 2.1 Release Candidate
1) Pre-Filter for Input Event Streams support <, >, !=, =, *, /, +, -, in (..), not in (..), and, or. The in clause can refer to an SQL Sub-Query. Such Sub-Queries are cached by the Kernel to improve performance
2) An additional property cacherefresh.threads.num can be configured to specify the number of Sub-Query Cache processing Threads to use
3) 2 new Test cases have been added to test the new features - H2StartupShutdown3Test and ThreeEventOrderTest
Wednesday, July 04, 2007
StreamCruncher 2.0 Beta is ready!
This 2.0 version is the result of a major refactoring job.1) The API has been greatly simplified. The internal architecture has changed considerably, resulting in a vast improvement in performance. The TimeWindowFPerfTest (single Query on a Stream) can do 25,500 Events per second on a 1.6 GHz Centrino! More details and log files in my next Blog
2) Plain Windows constructs that were available in previous version has been removed entirely. Partitions are the only construct now. The syntax for Simple (anonymous) Window Partitions has changed slightly, in that there is no by keyword between partition and store
3) An additional property db.schema can be specified, as the Database Schema in which the Kernel creates its internal artifacts
4) Chained Partitions from now on, must always have a Pre-Filter clause starting with $row_status is new/dead
5) The Kernel can now accommodate Events that arrive out-of-order. OutOfOrderEventTest demonstrates this new ability
6) Pre-Filter temporarily does not support the complete SQL grammar like in, exists
Phew! It took me more than a month for make these changes. The internal Event stores for Input Event Streams and Partitions have changed considerably. The idea was to keep the Events inside the Kernel's process for as long as possible and delay inserting the Events into the Database to the last stages. As a result, Events get passed around as references most of the time. The latency per Event has dropped dramatically.
There are a few things that need to be completed/cleaned up - like the Pre-Filter clause syntax, Kernel restarts do not work correctly. This version comes with the latest version of the H2 Database, which now supports Table-level concurrency - a much needed feature for StreamCruncher.
Saturday, June 16, 2007
Tuesday, May 29, 2007
StreamCruncher 1.14 is available! This release comes with a new feature/syntax - the self# clause, to perform efficient Self-Joins over Streams. Self-Joins are useful when the Events in a Window have to be scanned/matched against Events from the same or other Windows defined as part of the same Partition clause.
The StockPriceComparisonTest TestCase demonstrates the use of this new syntax. More details in the "StreamCruncher Basics" - documentation.
Sunday, April 22, 2007
StreamCruncher 1.13 Release Candidate is ready!
1) This version includes support for Oracle 10g and has been tested on Oracle Enterprise 10.2.0.
10g being an Enterprise grade Database, requires Tuning by a DB Expert before you start using it as the underlying Database for the StreamCruncher. I don't claim to be an Oracle expert and so I'd ask my DBA to setup the Database for very low Latency, deferred Disk flush, Larger Page and Cache sizes - so that he/she will translate that into the necessary Oracle Configuration changes. People have been creating TableSpaces on RAM Drives mostly to host Indexes for Tables that are constantly modified and heavily contended. I'd also think of creating the whole Database on such a RAM Drive.
StreamCruncher also creates Tables and Indexes for internal purposes. You'll have to ensure that the Schema in which these get created (usually the User name provided in the StreamCruncher DB Config file) are on the TableSpace that is Tuned & Configured for this purpose.
Another good thing to remember to tell the DBA would be the nature in which Events/Rows are operated upon in the Database via StreamCruncher. In any Internal Table, Events are mostly pumped by one Thread and consumed by another Thread - very similar to a Queue or a Conveyor Belt. Updates are done on an Indexed Column mostly on a small set of Rows that are usually in the Page Cache. All DB access (Insert/Update/Delete) by StreamCruncher on its Internal Tables are through Indexes - some Unique and some are not.
Remember, Oracle Database Tuning is an Industry in itself. Make sure you've tuned your setup well!
2) There was another small Concurrency issue in the Kernel that has also been fixed. The last of these kind of issues, hopefully. So, I've finally got rid of the "Unique Index Violation" errors I used to get only on Multi-processor machines. Version 1.12 had it fixed for Single Processor machines. I also have to admit that this fix affects the performance on single Processor machines too, though an increase of only by about 10-13%.
Tuesday, April 17, 2007
Event Processing in 2007 and beyond
An Essay, by Ashwin Jayaprakash (Apr 2007)
Website: http://www.streamcruncher.com
Where are we really headed?
The tenets of Event Processing
Online Retail Store
select unful_order_id, unful_cust_id from alert order_event.order_id as unful_order_id, order_event.customer_id as unful_cust_id using cust_order (partition by store last 30 minutes where customer_id in (select customer_id from priority_customer)) as order_event correlate on order_id, fulfillment (partition by store last 30 minutes where customer_id in (select customer_id from priority_customer)) as fulfillment_event correlate on order_id when present(order_event and not fulfillment_event);
select country, state, city, item_sku, sum_item_qty, stock_min_level from warehouse (partition by country, state, city, item_sku store lastest 500 with pinned sum(item_qty) entrance only as sum_item_qty) as stock_level_events, stock_level_master where stock_level_events.$row_status is new and stock_level_events.item_sku = stock_level_master.stock_item_sku and stock_level_events.sum_item_qty < stock_level_master.stock_min_level;
select country, state, city, item_sku, item_qty, order_time, order_id from cust_order (partition by country, state, city store last 30 minutes max 5) as order_events where order_events.$row_status is dead;
select order_country, order_state, order_category, order_item_sku, order_total_qty from cust_order (partition by order_country, order_state, order_category, order_item_sku store last 30 days with sum(order_quantity) as order_total_qty) to (partition by order_country, order_state, order_category store highest 5 using order_total_qty with update group order_country, order_state, order_category, order_item_sku where $row_status is new) as order_events where order_events.$row_status is not dead;
Links
- StreamCruncher, an Event Processor
- Prof. David C. Luckham's original Paper
- Prof. Jennifer Widom's Home Page
- Yahoo CEP-Interest Mailing list
Friday, April 13, 2007
Finally!! StreamCruncher 1.12 is ready and it's no longer a Beta version. This is the Release Candidate.
(Also, performance test results - read further. Hint: 8,000 TPS !! on 1.6 GHz Laptop)
I found the time to review some parts of the Kernel code. It turned out that there were small things here and there that needed fixing. Since the Kernel is heavily multi-threaded, it was important that locking be reduced. As a result, the CAS (Compare and Set) operations (Java 1.5+) are used in many places. This is much faster than actually waiting on a lock and then realising that the logic in the protected section does not have to be executed anyway.
After fixing these issues, I modified the "TimeWindowFPerfTest" class to capture more metrics. Apart from just calculating the Average Latency added to each Event by the Kernel in a Straight/Simple processing case, this Test now calculates the average total time it takes to insert rows into the Database and for the Kernel to publish them.
The Test was already described before. This time, with the bug fixes, there were no Index-violation exceptions. So, on my Laptop running Windows XP Home with 1 GB RAM and a single 1.6Ghz Intel Centrino Processor, I ran the "TimeWindowFPerfTest" performance test using the Sun JDK 1.6 and StreamCruncher 1.12 with H2 Database.
I redirected the verbose Console output to a log file and thereby eliminated the otherwise excessive overhead added by the Console logging. This way, I also have proof of all the Tests that were performed.
The Test uses a Thread to generate and pump 'X' events in one shot without pausing. A Query with Time based Partition is defined on this Stream. The Window size is 5 seconds. A "$row_status is new" clause is used to output only the new Events that arrive at the Window and not the ones that exit the Window when their 5 seconds are over. This way, an accurate measurement of how much overhead the Kernel is imposing can be calculated. The total time taken for the entire batch to be inserted and for it to be pumped out of the Kernel can also be calculated. This can then be used to calculate the Transactions per second - the most important metric.
The Test pumps these 'X' events and then waits for some time that is sufficient for the Events to clear the area and then pumps the same number again...and again..At the end of the Test, the results are retrieved, verified and then the Averages are calculated.
Ok, here it comes..Keep in mind that this is a single CPU and the Event "pumper" and the Kernel are running in parallel. The H2 Database (current version) is completely single-Threaded - and so there's no concurrency at all, even though StreamCruncher supports concurrent operations.
I ran 3 rounds for each configuration and here are the results:
Set 1 (4000 Events per Batch):
Set 1 - Round 1
Total events published: 36000. Each batch was of size:4000. Avg time to publish each event (Latency in Msecs): 224.0
Avg time (in Msecs) to insert 4000 Events into the DB: 418.0
Avg time (in Msecs) to process 4000 Events by the Kernel: 376.0
Avg time (in Msecs) for the insertion of first Event in the batch of 4000 Events into DB to publication of last Event in batch by Kernel: 598.0
Set 1 - Round 2
Total events published: 36000. Each batch was of size:4000. Avg time to publish each event (Latency in Msecs): 199.0
Avg time (in Msecs) to insert 4000 Events into the DB: 428.0
Avg time (in Msecs) to process 4000 Events by the Kernel: 397.0
Avg time (in Msecs) for the insertion of first Event in the batch of 4000 Events into DB to publication of last Event in batch by Kernel: 600.0
Set 1 - Round 3
Total events published: 36000. Each batch was of size:4000. Avg time to publish each event (Latency in Msecs): 261.0
Avg time (in Msecs) to insert 4000 Events into the DB: 387.0
Avg time (in Msecs) to process 4000 Events by the Kernel: 336.0
Avg time (in Msecs) for the insertion of first Event in the batch of 4000 Events into DB to publication of last Event in batch by Kernel: 591.0
Set 2 (8000 Events per Batch):
Set 2 - Round 1
Total events published: 64000. Each batch was of size:8000. Avg time to publish each event (Latency in Msecs): 378.0
Avg time (in Msecs) to insert 8000 Events into the DB: 603.0
Avg time (in Msecs) to process 8000 Events by the Kernel: 699.0
Avg time (in Msecs) for the insertion of first Event in the batch of 8000 Events into DB to publication of last Event in batch by Kernel: 1044.0
Set 2 - Round 2
Total events published: 64000. Each batch was of size:8000. Avg time to publish each event (Latency in Msecs): 457.0
Avg time (in Msecs) to insert 8000 Events into the DB: 533.0
Avg time (in Msecs) to process 8000 Events by the Kernel: 666.0
Avg time (in Msecs) for the insertion of first Event in the batch of 8000 Events into DB to publication of last Event in batch by Kernel: 1013.0
Set 2 - Round 3
Total events published: 64000. Each batch was of size:8000. Avg time to publish each event (Latency in Msecs): 392.0
Avg time (in Msecs) to insert 8000 Events into the DB: 593.0
Avg time (in Msecs) to process 8000 Events by the Kernel: 839.0
Avg time (in Msecs) for the insertion of first Event in the batch of 8000 Events into DB to publication of last Event in batch by Kernel: 1064.0
Set 3 (10,000 Events per Batch):
Set 3 - Round 1
Total events published: 70000. Each batch was of size:10000. Avg time to publish each event (Latency in Msecs): 491.0
Avg time (in Msecs) to insert 10000 Events into the DB: 705.0
Avg time (in Msecs) to process 10000 Events by the Kernel: 783.0
Avg time (in Msecs) for the insertion of first Event in the batch of 10000 Events into DB to publication of last Event in batch by Kernel: 1220.0
Set 3 - Round 2
Total events published: 70000. Each batch was of size:10000. Avg time to publish each event (Latency in Msecs): 518.0
Avg time (in Msecs) to insert 10000 Events into the DB: 689.0
Avg time (in Msecs) to process 10000 Events by the Kernel: 845.0
Avg time (in Msecs) for the insertion of first Event in the batch of 10000 Events into DB to publication of last Event in batch by Kernel: 1198.0
Set 3 - Round 3
Total events published: 70000. Each batch was of size:10000. Avg time to publish each event (Latency in Msecs): 513.0
Avg time (in Msecs) to insert 10000 Events into the DB: 647.0
Avg time (in Msecs) to process 10000 Events by the Kernel: 743.0
Avg time (in Msecs) for the insertion of first Event in the batch of 10000 Events into DB to publication of last Event in batch by Kernel: 1151.0
While the Tests were running, I kept noticing that the CPU even for the 10K Set was not rising above ~15% and that for just 1 second periods. Which is quite puzzling. It might be because the Producer and the Consumer Threads are not really running in parallel because the one common resource - the Database is always locked by one of these 2 (sets of) Threads. I was expecting the CPU to peak and the Tests to crumble at the 10K Set. But it didn't, which is a very good sign.
This means that StreamCruncher can do 8000 Transactions Per Second (Straight/Simple Processing) on a very ordinary setup and perform exponentially better on better hardware (more Cores and/or CPUs) and commercial Databases. This, combined with Horizontal Partitioning of the Stream data (using the Pre-Filters and multiple Queries to split the Events and process in parallel) should produce fantastic performance.
The test results/logs can be downloaded from here.
Monday, April 09, 2007
StreamCruncher 1.11 Beta is available. No changes to the code though. I had forgotten to update the Syntax diagram in 1.10 Beta which included the "$diff" and other custom Provider changes.
Saturday, April 07, 2007
StreamCruncher 1.10 Beta is ready! This release includes:
1) $diff clause for In-built Aggregate Functions along with custom Baseline Provider feature. ClusterHealthTest demonstrates this new feature.
2) Custom Window size Provider should now be declared in the Query, statically. TimeWFPartitionWinSizeProviderTest demonstrates this change.
3) A simple class StandAloneDemo shows how to run a sample using just the Java main(..) method. sc_run_standalone.bat can be used to run the Demo
Monday, March 26, 2007
Here's another reason why using an RDBMS in StreamCruncher was not a bad design decision at all. RDBMSes (what's the plural of RDBMS?) have attained a status (in Middleware) that is unequalled by any other technology. We trust RDBMS to store our Bank balances, a nation's entire Census data, Criminal records and the list goes on and on. So, when such a technology is always within easy reach, common sense tells us that it should not just be used, but embraced...Ok, enough of that and moving to the point ...
Many people who are familiar with ESP and CEP, when they learn that StreamCruncher uses an RDBMS underneath, probably wince at the mere mention of a Database. To them Databases are good but not good enough for Stream Processing. To many RDBMS is the anti-thesis of Stream Processing. Why? They begin to express vehemently about the "presumed" drawbacks of such an architecture. They start lecturing about Performance. Speed. "Sub-millisecond latency cannot not be achieved in regular RDBMS".. and so on.
Well, they're not entirely true. Even if you ignore the fact that StreamCruncher already supports several Embedded, In-Memory, Real-time Databases that are routinely used in Telecom (the DBs), there is a classic solution for regular RDBMS that will make them as good as any In-Memory Database. The biggest hurdle is the Hard disk - Persistence. Databases are meant to store data for posterity. But storing data to the Disk also adds a lot of latency. All other things offered by the Database like concurrency, scalability etc are very much required for Event Processing. Persistence is not required for Event Stream Processing. So, a regular Enterprise class Database can still be used as the base for StreamCruncher, which provides a solid/robust foundation on which to perform ESP/CEP - all this by creating the Database Tablespace on what is called a RAMDisk. Which is just a soft-drive, where a portion of the Physical Memory/RAM is turned into a Storage Drive (like C:\ or /usr/home/myname). This acts as any other Drive where files can be created etc but everything gets wiped out when the Machine is rebooted. This technique is not something new, but fits perfectly well in the context of StreamCruncher. RAMDrives can be created for almost any Operating System. A simple search on the Internet reveals several products and techniques for creating RAM Drives.
So, all the licensing costs that a Company has incurred on acquiring and maintaining these "Big Daddy" Databases can still be leveraged for Event Stream Processing. In the end, StreamCruncher gets to use a time tested Database that Developers have been using in their other regular Projects and can rely on the stability of such DBs. The important Sorting, Pre-Filtering, Joining and other CPU-intensive operations happen in the Database Engine, which is in Native code. And of course the Developers' familiarity with such Databases also plays a crucial role in adoption and integration with other parts of the project.
Sunday, March 18, 2007
StreamCruncher 1.09 Beta is ready and it includes a major feature addition. From version 1.09 onwards, Multi-Stream Correlation a.k.a Pattern matching is possible. This feature enables the monitoring of and correlation across multiple Streams of Events using a simple SQL-like Query.
Although simple 2 Stream Correlation is possible using regular SQL plus Partitions, as demonstrated in the SLAAlertTest, watching for multiple Patterns across more than 2 Streams is now very easy with this new "alert..using..when.." clause.
MultiStreamEventGeneratorChainTest demonstrates this new feature.
Thursday, March 15, 2007
People have asked me how StreamCruncher performs. Until today I did not have a clear answer because I did not have access to a good Server/PC. Last week I managed to borrow (just for a few minutes) a high end Intel/Window2K Server - 4 CPU, 3.6 GHz each with a total of 3 GB RAM.
Out of curiosity, I ran the streamcruncher.test.func.h2.H2TimeWindowFPerfTest test with all the default configurations to JVM, StreamCruncher 1.08 Beta, H2 Database etc, Parallel GC but with just 1 Collector Thread etc.
The test pumps 250 events in one burst, pauses for 6 seconds and repeats this for 4 cycles. So, it does 4 iterations of 250 events each. The test is a simple Straight Through Processing. A simple Time Based Anonymous Partition, which holds Events for 5 seconds. The Test measures the average time it takes for an Event to get expelled from the Kernel (publish by Kernel) since the time it was created. This does not include the 5 seconds it stays in the Window. It measures the Overhead added by the Kernel processing.
Keep in mind that StreamCruncher is still in Beta. There are still a few rough edges to be taken care of - StreamCruncher has neither been Profiled nor have load & performance tests ever been run.
On my 1.6GHz Win XP Laptop with 1 Gig RAM and JDK 1.6, the average Latency per Event for this test was a disappointing 380 Milliseconds. I was quite sure that the single CPU was bogging down the performance, because the Test creates and inserts live, randomly generated Data and the Kernel which is heavily multi-threaded also shares the same CPU. Performance was below expectations. Since all these operations were in memory, there was practically no opportunity for the CPU to switch between Threads while another Thread was blocking on the Network or Disk. So, there was practically no concurrency.
But when I ran the same test on the 4 CPU Box, the average CPU utilization did not even change noticeably while spitting out Events at a fantastic 7-9 Millisecond latency. I was blown away by the numbers. Preliminary though it may be, the multi-threading change in Release 1.05 must've really paid off. Another important thing was that the Tests were printing verbose output to the Console, which slows down the whole application. So, with Zero logging or logging to a file, latency might've improved even further. However, there was an ugly bug that reared its head when I re-directed Output to the File. I kept getting a "Unique Constraint Violated" error quite often. This has to be fixed soon. Since H2 Database was used in these tests, which in its current version does not support Multi-threaded access I'm hoping performance of StreamCruncher will be exceptional on other In-Memory Databases like Oracle TimesTen.
Sometime over the next few months I'll conduct proper tests on a stabler version of StreamCruncher and publish my findings.
Wednesday, February 28, 2007
The JDBC Engineers at ANTs have been busy fixing the bugs (a & b) in their Driver. Today, I received an email from them saying:
We have been tracking your issue as case 1206. As per your statements, we had logged the following bugs, which have been resolved, fixed & verified.Cool! So, we should be seeing these changes in their next release.
Bug 1415: JDBC getParameterTypeName() returns unknown for Date Datatype.
Bug 1416: JDBC getParameterClassName() method returns hard-coded string as "java.sql.ParameterMetaData"
Bug 1596: ANTs JDBC driver doesn't get registered
Bug 1597: ANTs JDBC Issue - NULL is inserted when Long.MIN_VALUE is inserted into bigint column