Sunday, September 26, 2010

Txn.commit() - Are you sure?

[+ indicates updated on Sep 27, 2010]

Transactions - do we need them and are people really using them like they are claim to?

We know that transactions are theoretically the best way to keep data consistent, but it might not always be the most practical way to do it.

There could be a variety of reasons:
 - Reduced performance after using transactions
 - Lack of proper XA support across all the participating resources
    - "Last resource commit"/XA emulation can leave some edge cases in a mess
    - There could be more than 1 resource that does not support XA. In such cases emulation will not work
 - There could be a need for nested transactions which are not widely supported
 - The transaction manager might not have proper support for repair/recovery of heuristic hazards
 - Multi-step transactions that need savepoints and lack of proper support or semantics for restoring it
    - Transactions that might be too expensive to retry from the beginning
    - If the client program crashes, then having a new client continue the transaction might not be feasible
    - Multi-page, lengthy UI forms that need disconnected data sets
 - Impractical for long running transactions and so on..

 Many others have written about it. I'd rather refer to their notes instead of write my own, from scratch:
 - Starbucks Does Not Use Two-Phase Commit
 - ACID Transactions Are Overrated
 - Computer says no
 - Transactions - Overused Or Just Misunderstood (Mark Little)
Remember - if Transactions work for you and all your systems support it, then go for it.

Having said that, there still are many systems where data flows across large applications; where a simpler, resilient and more predictable compensating mechanism is suitable. Simpler it may be, but designing such systems require a lot of foresight and expertise:
  - Optimistic concurrency based on version numbers
  - Atomic compare-and-swap upsert/update operations
  - Polite spin locks and backoff-retry mechanisms
  - Clear error reporting
  - State capture, repair and consistency checking
  - Operation logging, undo and re-apply
  - Proper documentation and involvement of Developer/Architect

For much larger systems like Amazon, LinkedIn and the like, availability is as important as consistency. See earlier references - #1, #2, #3 and #4.

Some interesting notes on Transactions that I keep referring to every now and then:
  - XA Exposed, Part III: The Implementor's Notebook
  - Distributed Transactions and Two-Phase Commit