Wednesday, February 28, 2007

The JDBC Engineers at ANTs have been busy fixing the bugs (a & b) in their Driver. Today, I received an email from them saying:

We have been tracking your issue as case 1206. As per your statements, we had logged the following bugs, which have been resolved, fixed & verified.

Bug 1415: JDBC getParameterTypeName() returns unknown for Date Datatype.

Bug 1416: JDBC getParameterClassName() method returns hard-coded string as "java.sql.ParameterMetaData"

Bug 1596: ANTs JDBC driver doesn't get registered

Bug 1597: ANTs JDBC Issue - NULL is inserted when Long.MIN_VALUE is inserted into bigint column
Cool! So, we should be seeing these changes in their next release.

Saturday, February 24, 2007

Ah, a harrowing 46 hour travel with long stops in between and now I'm in the US.

Here's a small list of things that one might consider to ensure a relatively easier flight - especially the Long-haul ones. First time travellers on such flights might find it useful.

  • Buy one or two Calling cards before you fly. This way you can call home later to say that you've reached safely (Mumbai/India: Airtel Cards or Reliance cards are sold in the airport)
  • Carry sufficient cash, but store them safely in different Cabin bags. Carry coins to use in public phones - for calling a Cab etc
  • Visit the Airline's website to check what you can carry and what you cannot. There are some stuff you cannot carry in Cabin bags, but are ok in Check-in bags
  • Long-haul flights wreak havoc on your lips, face and hands. The dry Cabin air and at Airports will cause serious lip cracks and very dry skin. Carry a small bottle of moisturizer and lip balm to rehydrate your skin. Believe me, it might sound silly, but it'll make you feel miserable later. Especially for kids. New rules specify that such liquids/creams cannot exceed a certain quantity if you are carrying them in your hand/cabin bag. Check with your Airlines
  • Carry some basic medication for Headaches, something for Colds and Muscle pain too (Neck strain). Ask your Doc
  • Be prepared to be stranded in the Airport. Flights can get re-scheduled or cancelled. Carry a change of clothes in your Cabin bags. You might have everything in your Check-in bags. But what if you've already checked them in and then the flight gets delayed? Warm clothing is also important. Put these in your Cabin bags
  • Make Photocopies of your tickets, visa, passport, contact details in the foreign country and your home in a cabin bag separate from the one with the originals. Emergency numbers in your wallet and a copy in your bag. Preserve your Boarding Passes until you've reached your final destination
  • If you are planning to drive, then get an International Driving Permit. It is accepted in some countries. Again do some research beforehand. Read up on the local driving rules. You can get a copy of the Driver's Handbook online -
  • Carry an unopened box of Breakfast Cereal or something like that in case you arrive at an odd time, like midnight and you can't go anywhere to eat
Bon voyage!

Wednesday, February 14, 2007

It's relocation time again. Singapore to Bangalore and now to the US. Which means that there probably won't be any StreamCruncher activity for the next few weeks. Not until I've settled down there.

Saturday, February 10, 2007

After a looong time, I spruced up my old Website - Over the past 5 years, I had struggled to keep my bookmarks on my Website uptodate. Exporting from Netscape/Firefox to HTML, then cleaning it up to XML, even wrote a couple of programs to convert XML to a DOM Tree was such a painful experience. So much so that I had entirely stopped updating my Bookmarks....until today.

Today, I bumped into a few cool Tools and Web Services that make Bookmark sharing a breeze. No, I'm not talking about those Social Bookmarking sites, I just didn't like them. They don't preserve any hierarchy, everything gets dumped into a big Tag cloud. I'm talking about Grazr and other OPML renderers. Take a look at this. Cool!! Isn't it?!

In just 4 steps. I'll tell you how:

  1. If you are using Firefox, get this BookmarksSynchronizer plug-in. It allows you to export your Bookmarks into XBEL format (Don't ask me what that is)
  2. Upload your XBEL exported Bookmarks into this YabFog Site, which converts XBEL into OPML (Again, don't look at me)
  3. Save the OPML format file and upload it into your Website
  4. Link your OPML file (the one you uploaded to your Site) through its URL from Grazr
That's it! You now have a super cool Widget to show off your Bookmarks.

This is what it'll look like - My Bookmarks. You can even make it look like it's part of your Web Page, like I have in " > My Favorites".

Friday, February 09, 2007

StreamCruncher 1.08 Beta is ready, with support for the pure Java, PointBase Database. PointBase is probably the most widely used embedded Java Database. HSQL might beat it in terms of sheer number of Desktops it is deployed on, as part of OpenOffice.

Even though PointBase does not support an In-Memory mode, the Database Page Size, Cache Flush Time and other settings can be modified to reduce Latency.

Wednesday, February 07, 2007

Speaking of computationally wasteful processing, yesterday's post listed a Query that needed re-writing.

    select .... from 
StreamA (partition by store last 10 minutes) as FirstStream,
StreamB (partition by store latest 25) as SecondStream

where FirstStream.eventId = SecondStream.eventId

and FirstStream.$row_status is not dead
and FirstStream.someColumn > 10
and SecondStream.$row_status is new
and SecondStream.otherColumn is not null;

Streams using the "$row_status is not dead" clause must be careful not to perform too much filtering inside the main body of the Query because the Criteria would get re-evaluated everytime the Query runs. So, if an Event remains in the Window for 20 Query Execution cycles, this "somecolumn > 10" would get evaluated for that Event 20 times.

Had we pushed this Filter criteria into the Pre-Filter clause, it would've been evaluated only once per Event. And, the Windows would not get polluted with Events that do not match the criteria, because they do not even make it into the Partitions.

Tuesday, February 06, 2007

There are a few things I've been meaning to write about how to write StreamCruncher Queries to achieve good performance.

If your Query filters Events based on some criteria in the Where clause, while co-relating it with Events from other Streams like this:

select .... from

StreamA (partition by store last 10 minutes) as FirstStream,
StreamB (partition by store latest 25) as SecondStream

where FirstStream.eventId = SecondStream.eventId

and FirstStream.$row_status is not dead
and FirstStream.someColumn > 10
and SecondStream.$row_status is new
and SecondStream.otherColumn is not null;

Here are some simple tips. The same concepts that you learned while optimizing SQL Queries apply here too.
a) Re-arrange the Filter conditions in the Where clause before the Join (Co-relation) to reduce the candidate Rows. In the Query above, the First and Second Streams are Joined on eventId and the resulting combined Rows/Events are filtered using the subsequent Filter criteria like "FirstStream is not dead and .. .someColumn > 10..". This is computationally wasteful because the Database joinsall those additional Rows/Events from the 2 Streams and then removes the ones that do not match the Criteria.
b) If the Events need to be Filtered, use the Pre-Filter clause in the Partition definition to consume only the required Events

Thus, the final Query should look like this:

select .... from

StreamA (partition by store last 10 minutes
where someColumn > 10) as FirstStream,

StreamB (partition by store latest 25
where otherColumn is not null) as SecondStream

where SecondStream.$row_status is new

and FirstStream.$row_status is not dead
and FirstStream.eventId = SecondStream.eventId;

You will notice that the Events are Pre-filtered in the Partition clause itself, where the un-necessary Events are weeded out even before they enter the Partitions. And since Events are fetched into Partitions in-Parallel with Query execution, you can shave off previous milliseconds by Pre-Filtering.

Another trick is to use the Table/EventStream with the least number of Rows/Events as the "Driving Table" (first Table in the Join) i.e the "SecondStream.$row_status is new" will have fewer Events because it picks up only the newly arrived Events to join with the other Stream. This speeds up Join processing time, if the Database underneath uses Hash-Joins.

It is also recommended to have the filter critera like "$row_status .." (the ones that cannot be Pre-Filtered) before the Join clause. So, use "FirstStream.eventId = SecondStream.eventId" in the end, after culling the Rows that are not needed so that only the required Rows are presented to the Join clause.

StreamCruncher 1.07 Beta is ready to be downloaded!

In this release:

  1. Solid BoostEngine's newer JDBC Driver (Build 04.50.0110) and above, does not have issues with Result.getTimestamp() on Timestamp columns [Old Post]. The StreamCruncher patch for this has been removed as there is no need for it anymore
  2. Another cool feature in StreamCruncher is the ability to have Windows in the same Partition with different Window sizes. WindowSizeProvider and TimeWindowSizeProvider classes in the streamcruncher.api package can be used for such customizations. A detailed example is provided in TimeWFPartitionWinSizeProviderTest

Friday, February 02, 2007

2 more Bug reports for the ANTs driver:

  • Bug number [1596]: ANTs JDBC driver doesn't get registered
  • Bug number [1597]: ANTs JDBC Issue - NULL is inserted when Long.MIN_VALUE is inserted into bigint column