Pic taken by my dad - Dr. Jayaprakash.
Sunday, December 27, 2009
Monday, December 21, 2009
Primary Key object - an under appreciated programming idiom
[Updated: Jan 4 2010 - CustomerKey class has more details]
There is in my opinion an often under used idiom - The Primary Key object. If you've used Container Managed Persistence in EJB or Hibernate or JPA then you probably know what I'm talking about. It's the very simple idea of creating a Serializable POJO and storing as fields, the unique Id(s) of the entity.
JEE 6 says this about Primary Key Classes. You override equals() and hashCode() and you are all set.
public class CustomerKey implements Serializable {
    private static final long serialVersionUID = 1L;
    protected String passportId;
    
    protected String familyName;
    
    /**
     * @param passportId - The actual key property.
     * @param familyName - A non-key property to assist in caching/optimization
     *                     etc.
     */
    public CustomerKey(String passportId, String familyName) {
        this.passportId = passportId;
        this.familyName = familyName; 
    }
 
    public String getPassportId() {
        return passportId;
    }
    
    public String getFamilyName() {
        return familyName;
    }
     @Override
    public int hashCode() {
        //Only uses passportId for the hash.
        .. .. 
    }
    @Override
    public boolean equals(Object obj) {
        //Only checks for passportId match. 
        .. .. 
    } 
}
So, why is it interesting? The use of Serializable POJOs does not have to be restricted to modeling Database persisted objects.
I've seen people make the make the mistake of premature optimization - i.e deciding to use a String or long or integer (some Java primitive) as the unique Id of an entity because it appears to satisfy the immediate requirement of an Id. The operative words being "appears" and "immediate". Using just 4 or 8 bytes to store the Id might seem like huge savings in space/memory at first but lets look at what you are missing:
- When your data grows and you realize that the 4/8 byte number is not enough what do you do?
- Your system has an internal Id scheme and the upstream system has another scheme
- By using Java primitives as your internal scheme you have very likely lost the original/upstream Id
- This means you cannot track the record back to its source. You might think you will not need it but try saying that to your customer during a production crisis when you've lost some records and you can't tell which ones exactly (Ouch!)
 
- Remember The Law of Leaky Abstractions? Once you start with your own internal/proprietary Id scheme it will eventually leak outside to the user/customer. Now your customer will want to use it directly and you are forced to expose/support/maintain that Id scheme (Double ouch!)
 
- What you thought could be easily generated as a monotonically increasing number on 1 machine is now impossible to handle on a cluster of machines (unless you have a singleton Id generator - SPOF!)
- Now that your data has grown and you have started using a Grid of some sort, here's what you would've liked to do had you taken the POJO Id route
- Data locality hints: You have a family of related objects and you want all of them to reside in the same machine instead of being spread across the grid randomly. So you stick a field into all the Primary Key (PK) POJOs called "customerId". Now all the OrderPK, ShipmentPK, FulfilmentPK, OrderLineItemPK keys will have a "customerId" in them but they need not be part of the equals()/hashCode() combo. So, you can program your Grid to make use of this "customerId" hint to place all the objects of the same family together and speed up your queries/retrievals
- Covering Index: If you realize over time that your application tends to retrieve only certain specific fields for processing and not all the data columns, you could actually move those fields into the PK class. This way you will get the columns you need along with the PK object and not have to download the values at all. Shaves a lot of time off retrievals but remember these things consume a little more memory. More on Covering Indexes
- Migrating data also becomes easier as you can add hints and version numbers into PKs
 
Overall, using a plain java.lang.Object or better yet, a well defined basic Interface as the Primary Key will go a long way.
Remember - Good design is always future proof.
Hibernate/JPA has a lot of nice annotations for these - @IdClass and @Id.
Saturday, December 19, 2009
Graph DBs - The other NoSQL
Yeah, you guessed it right. NoSQL has been the theme for this month.
Graph DBs - they are another alternative to Relational DBs and gaining momentum as part of the NoSQL family. Graph DB have always fascinated me because they do not require Schemas (!) and unlike the crippled, de-normalized NoSQL formats you hear about, Graphs can store relationships. That means the Joins are already there as relationships.
- My favorite Graph DB is Neo4J (obviously, since it's pure Java) and gaining some popularity and funding
- Here's an interesting entry about Freebase's internal graph engine - graphd
- Here's some SQL magic on - Trees In The Database - Advanced data structures
- Some more SQL meets social networks
Friday, December 18, 2009
Comp Sci algos and theory behind NoSQL
Very informative docs on distributed systems:
# NoSQL Patterns by Ricky Ho
# Design Patterns for Distributed Non-Relational Databases by Todd Lipcon
Tuesday, December 15, 2009
The rise and rise (again) of Ingres; Gartner vs ZL and other essays
Mark Logic CEO Dave Kellogg recounts his experience at Ingres and how Oracle dominated the DBMS market during its early years.
In another post, he writes about an interesting legal battle ensuing between Gartner and a tech company ZL Technologies. It really is scary if you think about it - be it Stock rating agencies or movie critics. Any self-appointed "authority on a subject" that gains a lot of visibility automatically ends up only acquiring more authority than before. Call it what you like - The Cluster effect or PageRank etc etc ... You could have a majority opinion but it doesn't necessarily mean you are correct/right.
Saturday, December 12, 2009
Articles on OSS and de-commoditizing technology
Some things to learn from successful Open Source Software (OSS) companies. Notes taken by Nati Shalom - Takeaway from Qcon – Part I.
Many agree that OSS has had a steady commoditizing effect on technology. While there are no  doubts about some of its merits, here's a post that does not talk about OSS per se, but about the non-technology side of things that we engineers rarely (want to) see - It’s The Relationship, Stupid! (Part1) - Stop Commoditizing The Client Facing Workforce.
The book I mentioned in a previous post also talks about de-commoditizing your software.
Monday, December 07, 2009
Book: Don't just roll the dice (Software pricing guide)
I read a small book during my road trip last weekend. It's called "Don't just roll the dice - A usefully short guide to software pricing". It's written by Neil Davidson, Co-founder and joint CEO of Red Gate Software a small-ish Software company that is doing well.
It's worth reading, even if you think you know about Price-vs-Demand, Support, Sales etc etc. His Blog is also very educative - Business of Software blog.
Deleting sub-folders based on a pattern (Windows)
I found this little script to delete ".svn" folders recursively in Windows. It's easy on Unix but I wanted one for Windows. Sweet! You can modify it to match and delete any pattern.
Monday, November 30, 2009
Road trip to Mt Rainier & Seattle via Weed
Last weekend I went on a road trip to Mt Rainier and Seattle WA via Portland. Portland, OR during Thanksgiving eve and couple that with non-stop rains it wasn't the best time to visit what seemed like a deserted city. 
A nice lake on the way to Rainier and the sunny weather mislead us into believing that it would be nice at the summit too. Boy were we wrong..
We headed to Mt Rainier only to find out that the trails needed snow shoes and ski season wasn't going to open until mid-December. Not that I wanted to ski, but bad weather followed us here too. We went as far as the visitor center and still did not catch a glimpse of the summit.
Strangely, Seattle turned out to be nice. It didn't rain for almost 1.5 days straight! If you visit Seattle, make sure you visit the Museum of Flight. My friends who did the Underground Tour said it was also worth it. There were several other places we could've visited but we ran out of time.
We drove past Mt Shastha, CA on our way back and pretended that it was Mt Rainier and justified our long trip to WA.
Mt Shastha from Weed, CA (Yes, Weed is a city in CA and no they don't sell weed).
Until next time....
Roast of NoSQL (by Brian Aker)
For those of you who've been drinking a little too much of the NoSQL Kool-aid lately, here's a funny video by the MySQL hacker Brian Aker picking on Map-reduce:
Thursday, November 19, 2009
Mark Reinhold said - Let there be Closures!
Yes! Finally, somebody is listening - JDK7 might have Closures after all.
Wednesday, November 11, 2009
They took his Jars!
(They Took His Dog! (Southpark))
Great! Google announced a new programming language called Go. It looks a like a cross between JavaScript x Python x C. Why go invent another language, like we don't already have enough? There's Scala, Clojure, Groovy, JRuby, not to mention plain Ruby... Does this mean they gave up on C++0x? Looks like it.
I read this y'day on someone's blog and I couldn't agree with him more:
Source: Benjamin Black
Tuesday, November 10, 2009
Debate on Dynamo and Cassandra
Very lengthy and heated debate on Amazon Dynamo and Facebook/Apache Cassandra - Dynamo - Part I: a followup and re-rebuttals. Mostly from the Clustering heavyweights at Facebook. Funny and educative.
Saturday, November 07, 2009
Friday, November 06, 2009
The Startup Founder Visa Movement
After mulling over whether to post this link or not for a while, worrying whether some might find such topics "unsavory" I decided to post this anyway.
There are many luminaries in the Industry who support this. Naturally, there will also be many against this (justifiably so). 
Anyway...here it is. There are a lot of links to follow in this article: Startup Founder Visa Movement
Cheers!
Thursday, November 05, 2009
I *heart* LINQ
.Net 4 is looking more and more attractive. I absolutely loved and at the same time was jealous of the LINQ feature in .Net.
Now with PLINQ (Parallel LINQ) it just become so attractive that it's almost illegal. At least that's how I see it compared to the language features in Java, which hasn't changed since 1.5.
Those of you who are familiar with Event Stream Processing and Continuous Queries, PLINQ is like a dream come true.
Sure, we have Fork-Join and the awesome "java.util.concurrency" package but the language features are somewhat lacking. Where are Closures, to start with?
This presentation by Stephen Toub is a very good introduction - Parallel programming in .Net 4 and Visual Studio 2010.
Wednesday, November 04, 2009
Fall colors (or false colors) in Sonoma, CA
Last weekend I drove all the way up to Sonoma with friends hoping to see Fall colors. After doing a lot of research on the best place to view Fall colors in California, I settled on Annadel State Park in Sonoma county. Man...were we in for a disappointment.
Upong reaching Annadel SP, we realized that all the trees were still green and showed no signs of changing color. It had rained a week ago and the ground was still moist. What were we thinking? If you want to see Fall colors in CA, you should just do it in your neighborhood park. Most of the trees here don't change color.
Anyway, we did a 4-5 mile loop hike to Lake Ilsanjo in Annadel. The only highlight was a small snake that crossed our path.
Next time, we'll just go to New England or Oregon or Seattle.
Monday, November 02, 2009
Volatiles and concurrency - there's still so much to learn
I was trying my hand at tuning some concurrency code last week. The good folks at the Concurrency-interest group showed me how just much there is yet to learn. Sigh..
Here is the Thread - Lazy fetching and volatile fields
Ashwin.
Friday, October 30, 2009
Andy Bechtolsheim speaks, GuestVM, Maxine and other tidbits
- Andy Bechtolsheim at HPTS 2009 on the future of Flash and other Computing technology trends. Hey, if Mr. Andy says so, then it has to be true
- Digg's use of Cassandra. This also means more contributors to Cassandra
- Ongoing research at Sun on VMs running directly on a thin virtualization layer - GuestVM and the Java meta VM that is Maxine- This space is certainly worth keeping an eye on because VMWare "might" eventually do this for Spring + OpenJDK + VMW, just like BEA's LiquidVM
- On a related note, Billy Newport had written about why Java in its current state might not be the best candidate for virtualization
 
- On an unrelated note, I wonder why VMWare has still not acquired GigaSpaces or GemStone or Terracotta or GridGain or any such technology after they acquired Spring
Thursday, October 29, 2009
Sunday, October 25, 2009
Software Inc - running, selling, failures and other interesting essays
Joy of Honesty in Business: A 5-part Series. I love this guy's blog.
Are sales people different from you and me? - Business of Software Blog. Something I as an Engineer have always wondered.
On a related note, what motivates people - money? Think again. TED - Dan Pink on Motivation
Interesting notes on why their SaaS BI startup LucidEra failed (from their ex-Director of Engg) - Why did startup LucidEra fail?
Sunday, October 18, 2009
Thursday, October 15, 2009
IntelliJ IDEA is now Open Source!
Wow, my favorite Java IDE - IntelliJ is now Open Source. It has to be because of intense competition and widespread adoption of Eclipse and to some extent NetBean's Groovy and JVM other language support.
Even though they are miles apart from IntelliJ, I guess "free" is a price that is very hard to compete with.
Scala and Groovy are going to lead the next wave, if you will, in JVM Languages. Even if they manage to build a sizable Developer community (not users, but contributors) around IntelliJ that can match Eclipse, which is quite unlikely; I wonder why they did it and how do they intend to make any money?
I know that their .Net IDE - ReSharper is way better than Visual Studio. I suppose that is their only remaining cash cow now. Who knows?
Monday, October 12, 2009
Will Oracle be good for Java?
Scott McNealy asks the audience "Will Oracle be good for Java?". A rhetorical question. Or is it? Find out what James Gosling has to say. Funny but very reassuring.
(Via: taranfx.com)
Saturday, October 03, 2009
Thursday, October 01, 2009
Link: The VAR Guy’s Open Source 50
A good list of Open Source companies/projects to keep an eye on - The VAR Guy’s Open Source 50.
Tuesday, September 29, 2009
Monday, September 28, 2009
Analytic Databases today and tomorrow
2 very enlightening articles on the state of Analytic Databases today. There are loads of them...strange to see that there seem to be more ADs than the regular DBMSes.
1) Bloor - Analytic Warehousing
2) Big Honking Databases - Please Stop Making More ADBMS Sausage
Saturday, September 26, 2009
Hiking in Wunderlich County Park
Wunderlich County Park is a nice hiking area just off Highway 84, past Woodside. I did the 5+ mile Bear Gulch loop mostly under the shade of the Redwoods. It's an easy-moderate difficulty hike but well worth the hike up to The Meadows, which is the midpoint in the loop and has a bench overlooking part of the Bay Area. It's probably a gorgeous view in Spring.
Sunday, September 20, 2009
Tuesday, September 15, 2009
This month in Sci-Fi
Good Sci-Fi I just finished reading:
House of Suns by Alastair Reynolds - I'm glad I came across this book. I simply loved the story telling. Smooth and refreshingly original. Deep space epic with a liberal dose of Hard Sci-Fi. Mmmm...
The Fall of Hyperion by Dan Simmons. A sequel to Hyperion, which I happened to like a lot too. A part of the Hyperion Cantos, with its strange blend of hellish Time Tombs, The Shrike and the City of Sad King Billy in the background. What I liked in particular about this was the exploration of the Omega Theory - will God evolve from us as a being of supreme power and intellect or is he already there?
Spoiler alert:
It's also strange to see the similarities between The Matrix movie and the AI/Core in the novel. Especially so, when you consider that this novel came out before the movie. Hmm..who borrowed from who?
Friday, September 11, 2009
Monday, September 07, 2009
Friday, September 04, 2009
Windowing features in PostgreSQL 8.4
PostgreSQL recently added a whole bunch of mini-analytic features in version 8.4. It also happens to be Turing complete with support for recursion. Ha! Recursion in SQL...
To me, the most interesting feature was the introduction of the partition by and related clauses. I remember using this feature in Oracle 10g in 2005, which I think was part of the Analytics package if I'm not mistaken. It's important to me because this is what inspired me in some way to start working on StreamCruncher and explore other Windowing concepts that are now standard in any Event Stream Processing product.SELECT key, SUM(val) OVER (PARTITION BY key) FROM tbl;
Thursday, August 20, 2009
A handy batch script to compile Java programs
[Updated: Dec 22, 2010]
I've always had some trouble remembering the right commands to compile a large-ish Java project from the command line when you don't have Ant at your disposal.
So I thought I should write about it so others (including myself) might find it handy for future reference.
The general idea is to have this kind of directory structure (duh!):
/myproject
    /src
    /classes
    /lib
The batch file. Make changes to match your directories.
The last 2 lines create a simple txt file of all your
.java file names in the src folder. It then uses the javac command to compile them into the classes folder.
set basedir=c:\projects\myproject del /Q %basedir%\classes mkdir %basedir%\classes set cp=%basedir%\lib\XXX.jar;%basedir%\lib\YYY.jar;%CLASSPATH% set JAVA_HOME=C:\Programs1\jdk1.6.0 set PATH=%JAVA_HOME%\bin;%PATH% dir /b /s %basedir%\src\*.java > %basedir%\java_files_to_compile.txt %JAVA_HOME%\bin\javac -classpath %cp% -d %basedir%\classes @%basedir%\java_files_to_compile.txt xcopy /S %basedir%\src\*.properties %basedir%\classes jar -cvfm %basedir%\myproject-lib.jar %basedir%\manifest.mf -C %basedir%\classes/ .
Monday, August 17, 2009
Fork-Join paper and the ReLooper tool
Informative paper on the Fork-Join feature in JDK7 by Doug Lea, of course - Supporting Fine-Grained Parallelism in Java 7.
An Eclipse plugin that automatically refactors loops into the ParallelArray construct in JDK 7 - ReLooper.
Wednesday, August 05, 2009
Forrester report on Complex Event Processing Platforms
I guess CEP is now a mainstream technology according to Forrester. What took them so long, huh?
According to the report (of which you can only find excerpts of unless you buy it), TIBCO BusinessEvents is in the forefront along with a few others. IBM and Oracle seem to be moving closer. Nice to see Esper also in the list. I wonder why JBoss Drools is not there.
Tuesday, July 28, 2009
Partitioning, Data Locality and some more reading material
From my ongoing reading about Distributed stores, Map-reduce, BASE and other related topics here's some "gyan" (alt meaning: musings of an armchair engineer):
Effect of Partitioning and Data Locality for scaling in a Distributed System:
If you are familiar with the Fallacies of Distributed Computing, 2 of those fallacies are very relevant even in a well replicated system, in an internal, trouble-free network.
Or rather, especially in an internal network where the network security, topology and bandwidth are not major issues. The ones that still cannot be ignored are:
» Latency is zero
» Transport cost is zero
The trouble with storing data in such a distributed system (key-value pairs or columns + column families) is that everything looks alright until you really need to cross-reference/lookup/join data between 2 such caches/stores.
Unlike in a regular database where you normalize everything into separate tables and then perform an Indexed-join across more than one database to re-assemble the data, there is no proper facility to do that efficiently in a distributed system. The cost is most often prohibitive because of the high latency that comes with moving let's say the left-hand-side keys of the join to the right-hand-side table in a join. And if you are attempting to replicate a complex join plan like in a database, well good luck.
To alleviate this problem, you would have to:
» De-normalize your data
» Or move all your relationships closer together in the cluster (Co-location)
Object-oriented DBs face a similar problem where you cannot perform Joins as freely as you would in a Relational DB. Their solution is a "pre-joined" or "always-joined but lazily fetched" graph structure.
The most common Distributed Hashtables that have been around in some form or another since the 90's like Chord, DHT, Bamboo etc have exactly these problems. Fast forward to 2009 and Memcached has a similar problem. To be fair Memcached was never meant to be anything more advanced than a large, evenly spread out, distributed hash table/store. Which it seems to excel at.
So, for anything more than simply caching MySQL data like Flickr images or Facebook friends list, this form of blindly hashing to nodes does not suffice. It makes perfect sense to use it in Freenet - which offers anonymous storage and retrieval in a P2P environment.
In fact, I read recently that Cassandra is considering implementing a locality preserving distribution mechanism.
A while ago, I read this very interesting article about Constrained Tree Schemas, which points to the next level in the Distributed System architecture. Namely, Compute Grids - send your code to where your data is. We've seen an incredible number of Open Source Data Grids. A few pure Compute clusters like GridGain and JPPF, but there are still very few projects/products (barring a few commercial Java products) that support Data + Compute in a seamless package. (For the curious reader, I was referring to features such as Key-Association, Key-Partitioning, Space-Routing etc in the commercial Grids. Do your own research now :-).
In schools today:
Also, I've noticed that real world Distributed Systems like GFS and Amazon Dynamo are being studied in courses these days. When I was in college, all we had was vague references to papers and thought experiments in books like Nancy Lynch's.Good reading:
I've never really understood how all those Hadoop sub-projects fit together - especially HDFS. Here's an excellent article about how LinkedIn uses Voldemort and HDFS to build and transfer their multi-GB search index every day.Until next time...cheers!
Sunday, July 26, 2009
Hiking in Huddart County Park
Today I did the Dean Trail - Crystal Springs Trail in Huddart County Park. A distance of about 4.5 miles. Probably a little more because the same trails keep criss-crossing towards the Toyon Camp area. It's not too bad. The park itself has a large recreational area with lawns and picnic benches.
Saturday, July 25, 2009
Resistance to "shady" Algo trading
If you've been following recent technology trends in high speed Algorithmic trading which sort of intersects with CEP/ESP technologies, then you must've wondered at some point about what goes on in those big Wall St companies like Hedge funds and the like. 
As an individual investor (if you can call us that) dabbling in the stock market with minuscule amounts of hard earned money - we often ask yourselves if we really are getting a good deal.
It turns out that the mainstream Press has caught wind of their questionable activities. It's no surprise that the Finance industry does this but still when it's laid out in such simple words before you, it's quite horrifying:
1) Senator Wants Restrictions on High-Speed Trading
2) Is Wall Street Picking Our Pockets? 
Hmmm..
(Update Sep 14 2009):
Recent ban on Flash orders - Gone in .50 Seconds
Wednesday, July 22, 2009
To parse or not to parse
I remember spending a lot of time working on SQL extensions and ANTLR grammar not too long ago. This was during my StreamCruncher days.
Looking back, the effort it required to write a clear, unambiguous grammar, simplify the AST, validate it and then finally produce some executable code was considerable. I distinctly recall that it was not very enjoyable.
Nowadays, the Fluent interface is gaining popularity, especially for implementing mini-DSLs. Here's a introduction to designing DSLs in Java.
For clarity, I'll refer to the custom language/grammar as Stringified-DSL and the Fluent interface as Fluent-DSL.
Now, don't get me wrong I'm all for designing a concise and crisp "language" for specific tasks. That's the whole reason I built my program on SQL extensions (i.e Stringified). But does it always have to be a dumb string - something in quotes, which cannot be refactored, no IDE support like auto-complete, the hassle of providing a custom editor for it...?
It doesn't just end there. When you expose a custom Stringified-DSL you and your users are stuck with it for a very long time. If you want to expose any data structures which your users will eventually ask for - after playing with the simple  stringified demos, then you will have to ship the new grammar to them, enhance your custom editor, document them clearly. Also, if the underlying language - let's consider Java suddenly decides to unleash a whole bunch of new awesome APIs, syntax and other language enhancements; which is exactly what happened with Java 5 if you recall - Annotations, Concurrency libraries, Generics. You see what your users would be missing because you chose the Stringified route?
In case you are wondering why your users would want to use Concurrency, data structures and such stuff - we are talking about Technical users and not Business users. Business users will be happy to play with Excel or a sexy UI, so we are not talking about their requirements.
So, what are the benefits of exposing a Fluent-DSL?
- Easy to introduce even in a minor version of the product. Quick turn around because it involves mostly APIs
- Documentation is easy because it can go as part of the JavaDocs
- Easy for the users to consume and use. Lower setup costs and only a mild learning curve
- IDE features - Refactoring, Type safety, no custom editor required
- Base language features - Users don't have to shoehorn their code into a clumsy custom editor, code can be unconstrained and they are free to use data structures, Annotations, Generics, Concurrency, Fork-Join and all the other programming language goodies. Which is very good in the long run because you don't have to paint yourself into a corner with all that custom stuff
- LambdaJ
- JaQu
- Those damned Closures that never made it
- FunctionalJava - to an extent
- MVEL - my favorite. The Interceptor feature is a nice touch
- JSF Expression Languages like - JUEL and SpEL
- MPS from the IntelliJ guys. I haven't understood this fully. It seems to be somewhat like MS Oslo, maybe less. But Oslo itself is hard to understand because of the poor arrangement of their docs
- DSLs in Groovy - although I'm not very convinced since Groovy itself has a learning curve
Cheers!
Monday, July 06, 2009
Road trip - Zion National Park and Grand Canyon, North Rim
Last weekend I did a 2600 mile road trip to Zion National Park and the North Rim of the Grand Canyon.
Zion was pretty good. I was expecting it to be similar to Bryce and Grand Canyon, but I was quite wrong. The best part of Zion was the hike up the Virgin River in knee deep water, on the floor of the canyon. 
We could've done another more strenuous hike to the top if we had another extra day. Perhaps another time. In Zion you have to use the Shuttle bus service to see about half of the park, which is a good thing. It gets even better if your bus driver happens to be the hilarious Jim L. I don't know is last name but you will enjoy his tour.
It rained one evening and I was afraid that it would ruin our trip. Thankfully, it stopped after about a half hour and we were treated to some beautiful sights when the sun peeped out of the clouds just before setting.
You can drive from Zion's east entrance (not the main entrance) to the North rim of the Grand Canyon in about 2.5-3 hours if I remember correctly. But if you start out late in the evening from Zion, it's good to just stop at Kanab town and find lodging there.
The North Rim is best experienced if you make proper arrangements in advance and go hiking to the bottom of the Canyon. We didn't and so I felt that another trip is in order some time.
Until next time...cheers!
Sunday, June 21, 2009
Sanborn-Skyline County Park (this time for real)
This is another nice park that intersects the fabled Skyline trail. The entrance is on the way to Big Basin and very easy to miss which I did a few weeks ago. So, this time I made the correct turn and made it about 2.5 hours before what looked like a Wedding party about to start. There must've been about 25 cars lined up at the gate that I saw while heading back home.
Well, the park itself is quite big and very nice. I wouldn't mind attending a party on the lush lawns and cozy picnic spots. There's a $6 parking fee. It's manned so, you don't have to worry about exact change, I think.
There was only 1 "long" trail that was open. I started on the Peterson trail, which is quite steep. It then merges with Sanborn trail and then goes all the way to meet the Skyline trail. It's about 2.4 miles to the Skyline trail, may be a little more. I continued for a few minutes on Skyline. Stopped for a snack and then headed back on the same Sanborn and Peterson trails.
Every time I go hiking under the Redwoods, I've noticed a sweet, mildly smoky, intoxicating smell usually where there is a patch on sunlight. I've tried stopping to get a second lung full and never managed to find it again. It's been driving me nuts. This strange, ephemeral smell always reminds me of the weird Picnic at Hanging Rock movie.
Until next time...
Saturday, June 20, 2009
Trouble with Schemas
I had to spend some time this week working on XML schemas. The trouble with schemas is that you can go all out and implement an almost object-oriented inheritance hierarchy. Which is exactly what I did.
I unwittingly set foot into the rarefied area of substitutionGroups and the like. And trying to find information when you don't know what to look for is a huge problem. So to save you the trouble and for my own future reference, here's what I learned. I could be wrong in several places, but it should be a good starting point for you.
This is what I was I had in mind. I needed a generic Object and a simple Collection-like container that could hold 1 or more Objects. And if you are not using XMLSpy...well good luck!
This is what the Object looks like:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="http://www.javaforu.com/schemas/lesson1" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.javaforu.com/schemas/lesson1" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="GenericObjectType"></xs:complexType>
<xs:element name="GenericObject" type="GenericObjectType"></xs:element>
</xs:schema>
And this is what the Object Collection looks like:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="http://www.javaforu.com/schemas/lesson1" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.javaforu.com/schemas/lesson1" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="GenericObjectCollectionType">
<xs:sequence>
<xs:element ref="GenericObject" maxOccurs="unbounded"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:element name="GenericObjectCollection" type="GenericObjectCollectionType"></xs:element>
</xs:schema>
That was easy enough. What I need now is a Person object that inherits from Object.
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="http://www.javaforu.com/schemas/lesson1" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.javaforu.com/schemas/lesson1" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="PersonType">
<xs:complexContent>
<xs:extension base="GenericObjectType">
<xs:sequence>
<xs:element name="firstName" type="xs:string"></xs:element>
<xs:element name="lastName" type="xs:string"></xs:element>
<xs:element name="age" type="xs:positiveInteger"></xs:element>
<xs:element name="gender" type="xs:string"></xs:element>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="Person" type="PersonType" substitutionGroup="GenericObject"></xs:element>
</xs:schema>
Note the use of
substitutionGroup="GenericObject" in the Person element. This means that I can create Person tags/elements where ever GenericObject tags are allowed. That's very cool!Here comes the clincher..If I need a special PersonCollection that only allows Person objects but still inherits from the GenericCollection, this is what I'd have to do:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="http://www.javaforu.com/schemas/lesson1" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.javaforu.com/schemas/lesson1" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="PersonCollectionType">
<xs:complexContent>
<xs:restriction base="GenericObjectCollectionType">
<xs:sequence>
<xs:choice maxOccurs="unbounded">
<xs:element ref="Person"></xs:element>
</xs:choice>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
Now, pay attention: For some reason, I'm yet to discover why, you cannot just replace the ref to the GenericObject with a Person ref when you have an
unbounded occurrence rule. Try it. It has to be wrapped in a choice rule.And of course the
substitutionGroup="GenericObjectCollection" will you use the PersonCollection where ever the GenericCollection is allowed.
<xs:element name="PersonCollection" substitutionGroup="GenericObjectCollection"></xs:element>
This is what a sample looks XML like:
<?xml version="1.0" encoding="UTF-8"?>
<DocRoot xmlns="http://www.javaforu.com/schemas/lesson1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.javaforu.com/schemas/lesson1 Lesson1.xsd">
<GenericObject></GenericObject>
<Person>
<firstName>sdsfxcv</firstName>
<lastName>xsdf</lastName>
<age>12</age>
<gender>m</gender>
</Person>
<GenericObjectCollection>
<GenericObject></GenericObject>
<Person>
<firstName>asd</firstName>
<lastName>fgdfgd</lastName>
<age>77</age>
<gender>f</gender>
</Person>
</GenericObjectCollection>
<PersonCollection>
<Person>
<firstName>abcd</firstName>
<lastName>def</lastName>
<age>20</age>
<gender>m</gender>
</Person>
</PersonCollection>
</DocRoot>
And the full schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="http://www.javaforu.com/schemas/lesson1" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.javaforu.com/schemas/lesson1" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="GenericObjectType"/>
<xs:complexType name="GenericObjectCollectionType">
<xs:sequence>
<xs:element ref="GenericObject" maxOccurs="unbounded"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="DocRootType">
<xs:sequence>
<xs:element ref="GenericObject" minOccurs="0" maxOccurs="unbounded"></xs:element>
<xs:element ref="GenericObjectCollection" minOccurs="0" maxOccurs="unbounded"></xs:element>
</xs:sequence>
</xs:complexType>
<xs:element name="GenericObject" type="GenericObjectType"></xs:element>
<xs:element name="GenericObjectCollection" type="GenericObjectCollectionType"></xs:element>
<xs:element name="DocRoot" type="DocRootType"></xs:element>
<!---->
<xs:complexType name="PersonType">
<xs:complexContent>
<xs:extension base="GenericObjectType">
<xs:sequence>
<xs:element name="firstName" type="xs:string"></xs:element>
<xs:element name="lastName" type="xs:string"></xs:element>
<xs:element name="age" type="xs:positiveInteger"></xs:element>
<xs:element name="gender" type="xs:string"></xs:element>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="PersonCollectionType">
<xs:complexContent>
<xs:restriction base="GenericObjectCollectionType">
<xs:sequence>
<xs:choice maxOccurs="unbounded">
<xs:element ref="Person"></xs:element>
</xs:choice>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:element name="Person" type="PersonType" substitutionGroup="GenericObject"></xs:element>
<xs:element name="PersonCollection" substitutionGroup="GenericObjectCollection"></xs:element>
</xs:schema>
If you find any mistakes or have any other such interesting bits of information to share, I'd love to hear it.
Sunday, June 14, 2009
Pycon 2009 talk - Drop ACID and think about data
This is worth watching - Drop ACID and think about data. A nice summary of current cluster/cache/distributed db technologies.
Thursday, June 11, 2009
Tuesday, June 09, 2009
A cool 100K open network connections
A nice read - 100K simultaneous connections using Java NIO, actually Netty.
Saturday, June 06, 2009
Beautiful visualization of Huffman encoding
How I wish all algorithms were animated and illustrated this well - Huffman encoding. Yay! I now know how compression works.
Tuesday, June 02, 2009
Java powered Netbooks
This is interesting - Oracle's Ellison considers netbook market foray.
Also interesting to hear that JavaFx will get more support. Strange.. considering the fact that most people thought JavaFx/Desktop Java would die after the Oracle acquisition. More hints here with Larry Ellison and Scott McNealy on stage at the JavaOne 2009 inauguration (at 1:38:11). Not just that but he was also eyeing the Google Android Netbook market. Seems natural enough to mere mortals like us now that the visionaries have laid out a plan of attack on the pseudo-JVM that is Android.
As an aside - Acer to sell Android netbook PCs in Q3.
Sunday, May 31, 2009
Food map
Restaurants, coffee shops and snack bars that me and/or my friends liked - Bay Area, Mumbai, Bangalore and other places.
View Food in a larger map
Thursday, May 28, 2009
Jofti - where are they now?
I wonder what happened to the "Cache index provider" - Jofti. It looked like a very nice project and then it just died. Heck, they even moved their website without letting anyone know. Well..RIP.
Monday, May 11, 2009
All your BASE are belong to the Cloud
If you haven't heard about the CAP vs BASE debates yet, then you are definitely not following the "cloud" or distributed systems space very closely.
Before you wander off following these links, let me tell you briefly what CAP and BASE stand for:
CAP = Consistency, Availability, Partition-tolerance
BASE = Basically Available, Soft state, Eventual consistency
Apparently, the CAP theorem says that you can only have any 2 of the 3 at any point when you are talking about really large scale, horizontally scaled software. And this article explains quite clearly why that is so.
Dr. Eric Brewer of Inktomi (and Berkley) fame had described an alternative way back in 1998 called BASE. 
He said that your software has to be aware of its distributed nature. Instead of relying on the Database or some other centrally (single point of failure) shared resource like a File system to provide ACID guarantees, build your software such that it can accommodate failures of increasing severity. Only then can you really scale out.
There is plenty of material on the Web to do this research. I am aware of people using Distributed Caches and its variants on massive scales such as - Memcached, Hadoop, Cassandra, Voldermort etc. On the other hand I have personally seen people using Distributed Caches running alongside traditional OLTP systems - almost the other end of the spectrum where you have modest 20-30 nodes.
However, I was wondering what people/projects in the middle of this cluster-scale are doing. I found 2 interesting presentations:
1) Financial Transaction Exchange at BetFair.com
2) Forging ahead - Scaling the BBC into Web/2.0   
The first BetFair presentation doesn't really talk about clusters, per se. But they do talk about their experience with various scalability models they attempted. Their architecture seems to be smack dab in the middle of this scale. They still use a single large Database for most of their work.
The second one is moderately interesting because they are very much like these other massive scale Web 2.0 systems. I have great respect for such systems, but I've already read about such things like Amazon, Flickr, Facebook and the like. But one really interesting thing they have done is use markers (like Airport Threat Level colors :-) to identify the health of the servers in the system. Based on this they decide what to do with the user requests. This is perhaps the clearest application of BASE that I've read about so far. Amazon's Dynamo also does this, I believe - but they use Vector clocks and other algorithms. 
Anyway.. interesting reading.
Wednesday, May 06, 2009
Lessons in API design
I like to keep reminding myself about the importance of API design. This quote by the Computer Scientist Alan Kay sort of sums it up very nicely:
Simple things should be simple, complex things should be possible.
And this not too old presentation by Joshua Bloch is worth watching/reading - How to Design a Good API & Why it Matters.
Wednesday, April 29, 2009
JBoss Cache getting a makeover?
JBoss Cache getting a makeover or spawning a cool new product? Check out - Infinispan.
Tuesday, April 28, 2009
100G JVM heap?
Wow! - this is absolutely cool. I remember reading about a 3 Terabyte heap using JRockit a few years ago. Of course the GC delays were long and the number of objects/bytes allocated and de-allocated were less. But this story is a real life scenario. Nice.
Monday, April 27, 2009
Hiking in Sanborn-Skyline County Park
This is what the title would've been if I hadn't missed the turn completely because I was enjoying the drive a little too much, while listening to Prairie Home Companion on NPR and drove off instead to Big Basin. So, I did the Sempervirens Falls/Sequioa trail - a total of 3.4 miles and then did a shorter hike on Dool trail. Love the Redwoods.
Friday, April 24, 2009
In-Memory databases are not dead
A nice coincidence - Marco talking about In-mem DBs and their relevance in Complex Event Processing almost at the same time I reported about Derby's In-mem backend.
Tuesday, April 21, 2009
Java JAVA Java
Oracle and Sun. Well, I had to write about it. Another voice in the crowd, but still - having worked on Java for almost 10 years now, I felt that I had to say something.
I guess we all saw this coming. It had to happen eventually. Sun was just running aground with year after year of loss making.
That apart, what almost all of us are really interested in is really what direction Java will go under the captaincy of Oracle. Just 2 weeks ago, we were all warming up to the idea of IBM taking over Sun/Java and now almost overnight we see a complete turn of events. 
I guess the thought of IBM owning Java would not have troubled us as much as Oracle owning Java. IBM is not new to working in/with the Community, no matter how half-hearted their attempts were/are. See Eclipse. And then Derby. Perhaps Eclipse alone redeems them. But Oracle? Will they still play nice with IBM and WebSphere? How about the little guys like Apache, JBoss, Spring and the rest of us mortals? Will the vibrant and thriving Java community still live? 
Questions..and more questions. Only time will tell. There is no doubt that Java platform itself will grow stronger, but will it remain freely available and accessible to everyone without having to shell out $$$ to Oracle?
All we can hope is that Oracle will embrace Java and the Community rather than constrict it like a boa.
And of course we all know what will become of MySQL. All the startups, Web 2.0 and Cloud companies that relied so heavily on this will begin to wonder why they did not use PostGRES.
Sunday, April 19, 2009
Hiking in Monte Bello Open Space Preserve
My first hike this season and it was wonderful. I started late in the day and it was already quite warm - in the 80s. The meadows smelled wonderful, so much so that I was tempted to just lie in the shade of some tree and doze off. Something about the warm and sweet smell of a Spring day. Resisting the urge to lie down, I went on down Canyon Trail. I completed the loop by coming back via White Oak trail. Roughly 4 miles or so. I could've done the longer trail but I didn't want to stress myself, this being the first hike after almost 3 months :)
Friday, April 17, 2009
In-memory Derby DB
Heard from David Van Couvering that Derby 10.5 will support In-memory databases. I was looking for this feature some 3 years ago in Derby - old mailing list archive. There was a prototype of some sort, but was kinda flaky. I'm very excited that 10.5 will have this built-in.
Hopefully this will ship with a future version of JDK6. Imagine what amazing things you'll be able to do then. This along with Map-Reduce in JDK7...ahh..shweet. (Geek alert!)
Tuesday, March 24, 2009
Object allocation improvements in the JDK
It's good to see that someone is very closely following the "Stack allocation" features that are being developed in the JDK - http://blog.juma.me.uk/2008/12/17/objects-with-no-allocation-overhead/.
It would be good if someone (anyone but me :) also had some info on the G1 collector that was back ported to JDK 6 release 14.
Tuesday, March 10, 2009
Interesting articles about Open Source from a finance angle
Some interesting articles I read recently (also for my future reference...Yeah, I know there are things called Bookmarks):
1) Are open source vendors more capital efficient?
2) Jonathan Schwartz - Technology Adoption
Thursday, February 19, 2009
Got cache?
In recent years, we've seen a general trend in data handling and management where databases are being relegated to a slightly lower pedestal in the overall scheme of things.
Umm..that was a mouthful, I apologize. What I meant was that even to a casual observer (who doesn't get worked up like you and me when you talk about gigabytes of transactional data), it is becoming obvious that networks are getting faster and faster, memory is getting cheaper (cheaper but not inexpensive) but the darned hard disks are still plodding at much slower rates. Gigantic distributed systems that were once restricted to college campuses with a lot of PhD people mucking about have slowly made their way into everyday projects. Thankfully, not the PhDs. Google made it look sexy to have a huge number of PCs storing data mostly in RAM, forming a "data cloud" (No, I certainly did not invent that term).
Some large banks have been caching their data in modest sized distributed caches. Most of them have been doing it for some time, but they were really clustered App Servers with a cache that just happened to be there. But now, we see people designing their systems to be intentionally distributed. Web 2.0 startups are another prime example. Some of them have written their own distributed caches and clusters, gunning for Google's market share no doubt. All this sudden interest has resulted in some wonderful open source projects.
 To prove my point, here's the list I'm talking about: (Some of them are new, some have been around for a while)
 Apache Hadoop: http://hadoop.apache.org/core/
 Hypertable: http://www.hypertable.org/ (similar to Hadoop's sister project)
 Memcached: http://www.danga.com/memcached/
 JBossCache: http://jbosscache.blogspot.com/
 Terracotta: http://www.terracotta.org/
As we try to think about where we are headed, in terms of data management it becomes fairly obvious that Databases, on which we've come to rely so much upon for storing/processing/slicing & dicing data are not really suited for every job. We don't really have to create strict relational schemas, to hell with normalization, disk quotas...we can just store entire objects as-is in a cache. That way we don't have to waste time joining the parent-child relations over and over. Sounds a lot like OODBMS. Let the programmers have the to freedom to store the objects optimally instead of a DBA instructing them. Distributed caches automatically come with fail-over capabilities, which makes things even simpler.
I'm sure I can keep rambling, but you get the point. I'll keep updating this list. Just reading the blogs of these projects' authors is educative, to say the least.
Wednesday, January 28, 2009
Tuesday, January 13, 2009
Project Voldemort, another Dynamo clone
Incredible, another Amazon Dynamo clone - Project Voldemort. Is there no end to this list Open Source Distributed Caches? I'm not complaining though.
Sunday, January 11, 2009
Hiking in Almaden Quicksilver County Park
Today I went hiking in Almaden Quicksilver County Park, Santa Clara. This is the site of an old Mercury mine. The trails are pretty ok. I started at the Hacienda Park entrance, went up Mine Hill Trail, then on Randol Trail. I turned right onto Buena Vista Trail, which then meets New Almaden Trail. I didn't like this stretch of the trail because it goes too close to the residential plots on Almaden Road. After about a mile, I started heading back on Hacienda Trail. Hacienda trail is quite steep. No cycles allowed, but it felt like a good workout. That's a total of 5.9 miles.
Tuesday, January 06, 2009
Device drivers in Java
Now that Real-Time Java is really here, it's time to look at some wacky stuff to do in Java I guess.
Sunday, January 04, 2009
The Money Masters
Wow! A friend told me about this documentary called "The Money Masters" about how money is "created" in the US, about how the Federal Reserve is really a quasi-public bank and other such mind boggling facts...It's a fairly long documentary. I've found the time to watch only the first part so far. But it's incredible, absolutely incredible.
Videos:
Friday, January 02, 2009
jmemcached - pure Java port of memcached
Continuing my list of Java based Distributed Caches, here's another new one I spotted - jmemcached.
Ingenious online ETL
Here's some really amazing ETL (Extraction, Transformation and Loading) done by someone using Google Docs and Yahoo Pipes. I just loved it.