Speaking of computationally wasteful processing, yesterday's post listed a Query that needed re-writing.
select .... from
StreamA (partition by store last 10 minutes) as FirstStream,
StreamB (partition by store latest 25) as SecondStream
where FirstStream.eventId = SecondStream.eventId
and FirstStream.$row_status is not dead
and FirstStream.someColumn > 10
and SecondStream.$row_status is new
and SecondStream.otherColumn is not null;
Streams using the "$row_status is not dead" clause must be careful not to perform too much filtering inside the main body of the Query because the Criteria would get re-evaluated everytime the Query runs. So, if an Event remains in the Window for 20 Query Execution cycles, this "somecolumn > 10" would get evaluated for that Event 20 times.
Had we pushed this Filter criteria into the Pre-Filter clause, it would've been evaluated only once per Event. And, the Windows would not get polluted with Events that do not match the criteria, because they do not even make it into the Partitions.
0 comments:
Post a Comment