Sunday, September 26, 2010

Txn.commit() - Are you sure?

[+ indicates updated on Sep 27, 2010]

Transactions - do we need them and are people really using them like they are claim to?

We know that transactions are theoretically the best way to keep data consistent, but it might not always be the most practical way to do it.

[+]
There could be a variety of reasons:
 - Reduced performance after using transactions
 - Lack of proper XA support across all the participating resources
    - "Last resource commit"/XA emulation can leave some edge cases in a mess
    - There could be more than 1 resource that does not support XA. In such cases emulation will not work
 - There could be a need for nested transactions which are not widely supported
 - The transaction manager might not have proper support for repair/recovery of heuristic hazards
 - Multi-step transactions that need savepoints and lack of proper support or semantics for restoring it
    - Transactions that might be too expensive to retry from the beginning
    - If the client program crashes, then having a new client continue the transaction might not be feasible
    - Multi-page, lengthy UI forms that need disconnected data sets
 - Impractical for long running transactions and so on..

 Many others have written about it. I'd rather refer to their notes instead of write my own, from scratch:
 - Starbucks Does Not Use Two-Phase Commit
 - ACID Transactions Are Overrated
 - Computer says no
 - Transactions - Overused Or Just Misunderstood (Mark Little)
 
Remember - if Transactions work for you and all your systems support it, then go for it.

Having said that, there still are many systems where data flows across large applications; where a simpler, resilient and more predictable compensating mechanism is suitable. Simpler it may be, but designing such systems require a lot of foresight and expertise:
  - Optimistic concurrency based on version numbers
  - Atomic compare-and-swap upsert/update operations
  - Polite spin locks and backoff-retry mechanisms
  - Clear error reporting
  - State capture, repair and consistency checking
  - Operation logging, undo and re-apply
  - Proper documentation and involvement of Developer/Architect

For much larger systems like Amazon, LinkedIn and the like, availability is as important as consistency. See earlier references - #1, #2, #3 and #4.

Some interesting notes on Transactions that I keep referring to every now and then:
  - XA Exposed, Part III: The Implementor's Notebook
  - Distributed Transactions and Two-Phase Commit

Saturday, September 25, 2010

Hiking in Upper Stevens Creek County Park

Upper Stevens Creek County Park is just opposite Long Ridge Open Space Preserve on CA 35. You have to go down Grizzly Flat trail and then back up. There are 2 trail heads next to each other - The North and the South legs. You can go down one and come back up the other.

The trail goes about 1.1 miles down where the North and South legs meet. You can go further down to Grizzly flats junction where it meets Canyon trail. There is a nice stream at the bottom. You can cross it and go further down towards Page Mill. I however, turned around and came back up.

This is not an easy trail. It is about as steep as Windy Hill Open Space Preserve, probably a little less. But the whole trail is always in the shade and is very quiet and pleasant.

Saturday, September 18, 2010

Whither ORM?

If you are scratching your head wondering why even after so many years there is still a confusing mix of ORM solutions, then you are not alone.

JPA/JDO/Hibernate/Spring/Mybatis/Cayenne - which one and why?

And if you are planning to write something on your own when there already are so many, then you should probably go read about NIH.

I'm actually amazed that even after the relative acceptance of NoSQL we are still struggling to standardize on ORM for SQL/RDBMS.

It's almost like a religious debate:
 - https://www.jfire.org/modules/phpwiki/index.php/Why%20not%20JPA
 - http://www.datanucleus.org/products/accessplatform/persistence_api.html
 - http://www.datanucleus.org/products/accessplatform/jdo_jpa_faq.html
 - http://www.dzone.com/links/search.html?query=jdo
 - http://www.dzone.com/links/search.html?query=jpa
 - http://java.dzone.com/articles/jpa-performance-optimization

Good old Apache itself has 2 solid implementations of JPA and JDO. Both seem very mature and very well documented:
 - http://db.apache.org/jdo/why_jdo.html
 - http://openjpa.apache.org/

Apache also has some offbeat/non-standard implementations. Some dead, some doing well:
 - http://cayenne.apache.org/why-cayenne.html
 - http://db.apache.org/
 - http://www.mybatis.org/ - formerly Apache iBatis

Reading a busy Twitter stream with @s and #s is as hard as parsing unformatted XML with your eyes - correction, with 1 eye closed.

Books I read in the last few months

Blindsight by Peter Watts: I thoroughly enjoyed reading this book. Although the ending was a bit of a let down, the amount of research that has gone into writing this book is impressive. It has a very refreshing combination of bio-chemistry, human vision, psychology and AI.

Galactic North by Alastair Reynolds: A collection of short stories. Generally, I try to stay away from short stories because I feel the characters do not have time to develop and neither does the story. This one however has a continuous feel across stories and is worth reading if you liked Revelation Space.


Eifelheim by Michael Flynn: Another Hugo nominee (I think). Not too bad it you'd rather have the story wander off into a medieval village setting during the time of the Black Death. Certainly not in the same league as the Sci-Fi masters.

Liar's Poker by Michael Lewis: This is not sci-fi at all. It's a 20 year old book about an Investment bank - the infamous Salomon brothers. They say history repeats itself. Just replace Salomon Brothers with Lehman Brothers and add a generous measure of greed and short sightedness. This is a very funny book considering what the book is all about. Well worth the read.

Spin by Robert Charles Wilson: Here's one I tried reading but just couldn't get myself to finish it. For a Hugo award nominee this was a disastrous read.

Sunday, September 12, 2010

Hiking in Skyline Ridge Open Space Preserve

Skyline Ridge Open Space Preserve, is right next door to Russian Ridge off CA 35. This is perhaps one of the nicest hikes for beginners. Gentle slopes, good combination of shade and open trails, great view just a mile into the hike, a park bench and a pond (actually 2) at the end of the 1.5 mile Ipiwa trail.

Start from the Skyline parking lot and follow the Ipiwa trail, cross Old Page Mill road and you will reach Alpine Pond. There are many shorter walkways you can use to spend time around the pond. Then head back the same way to Skyline parking.