Sunday, April 15, 2012

LSM, HMM, SQL, JVM and other TLAs (Three Letter Acronym)

Here's a bunch of articles that I read recently and were worth sharing. But before that, a minor digression:

Just JVM!

I remember having a conversation with my manager at my first job, she told me that I should stop learning Java and move to C# because Java will never survive. I just smiled politely and went and did exactly the opposite. That was almost 11 years ago. Today you see so many high end startups in Silicon Valley and in other places (including the hipsters who are not Ruby'ing) running a lot of their infrastructure on the JVM.

C# is a great language but you also need libraries and a lot of shared, community knowledge to develop good software. I think C# lacks in that area.

And then there was J2EE which was absurdly complex, but not as bad as CORBA which it replaced. That led to Spring and now Spring is complex and bloated. Just last week, my wife (she is doing her Masters in Software Engg) had to use JEE and Spring for a project submission said "The JEE 5 project was cake walk" compared to Spring. I burst out laughing saying how things have changed (again).

I've always wondered what Spring was all about. Then I watched this presentation that had "NoSpring movement" on a slide - borrowed from NoSQL movement but very apt, it brought a smile to my face.

A random and incomplete list of folks using the JVM platform for high volume data/transactions - #1, #2, #3, #4, #5, #6, #7, #8 and so on...

For a variety of reasons I love Coherence. In my mind it is the original Java NoSQL system - at least for the enterprise. Here are some recent articles I read up about Coherence:
Some stuff about SQL and it's cousin NoSQL.

If you've managed to reach this paragraph, then here's a nice treat for you - real Comp Sci stuff:
Until next time!

Saturday, March 03, 2012

Redis (and Jedis) - delightfully simple and focused NoSQL

Redis is an open source NoSQL project that I had not paid much attention to. Largely because it didn't seem very special at the time nor did it have a good persistence and paging story. Also, there is/was so much noise out there and the loudest among them being Memcached, Hadoop, Cassandra, Voldemort, Riak, MongoDB etc that it slipped my mind.

Last weekend I thought I'd give Redis another try. This time I just wanted to see Redis for what it is and not compare it with other solutions. So, as it says on the site:

Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
Seemed interesting enough to warrant another look. There are so many projects that need:
  • Simple, fast, light
  • In-memory (with optional checkpointing)
  • Fault tolerant / Sharded / Distributed
  • Shared access from many processes and machines
  • Some real data structures instead of just wimpy key-value
  • Flexible storage format - without needing crummy layers to hide/overcome limitations
  • Clean Java API
So, I downloaded the Windows port of Redis and Jedis JAR for the Java API.
  1. Unzip Redis Windows zip file 
  2. Copy the Jedis JAR file
  3. Go to the 64bit or 32bit folder and start "redis-server.exe"
  4. Write a simple Java program that uses Jedis to talk to the Redis server
  5. That's it
The size of the files, elegance and the simplicity reflect the fact that Redis and Jedis are written by individual, smart developers. With contributions of course but still...

This is what the program looks like:

This is what the output looks like:

Redis offers many of the things you'd have to look for in separate products like cache, expiry, key-value, flexible column-family store, load and data distribution, fault tolerance and more. You can also experiment with transactions, pub-sub, sharding, master-slave, persistence etc.

It cannot do super heavy data volumes (like multi-terabyte) as it's disk persistence is quite primitive, lacks complex querying and indexing features (which is a common limitation in NoSQL), multi-threading, more advanced/robust/consistent sharding/replication/distribution/expansion.

Having said that we as architects sometimes over engineer things or try to look for tools that can do everything well, which really ends up as "everything mediocre".

Some useful links:

Until next time!

Saturday, January 28, 2012

Kotlin, Unix Philosophy, Cypher and loads of tech stuff to read

There are so many JVM based languages out there and yet none of them seem to be able to take over from good old (has it really been that long) Java. From my point of view (I'm sure it's shared by many others) most of these languages like Scala, JRuby, Groovy, Gosu, Clojure, Ceylon etc feel too far away from the center. They either propose radical syntax and idioms, seem overly clever, too flexible and hence not clear/safe for the average developer, or just not performant enough.

Take Scala for example. If you pay no heed to the sensation Yammer's leaked email to TypeSafe caused and just read between the lines, there are some very real truths there about what a large population of developers, team leads, managers expect from a language before they can invest time, training, money, people and their careers into it.

Which is why, I find JetBrain's Kotlin and to some extent for non-server side Google's Dart to be promising. Neither of these languages are about revolutionary/radical new concepts or "paradigm" shifts (ahem) nor do they hide their Java roots. Dart is definitely not JVM based but neither is Kotlin planning to stick to just the JVM (LLVM plans). They are both intended to be evolutionary cleanups with some state of the art language features. Only time will tell if they succeed.

To add to our distractions we have Node.js with some appeal if you are just out of school and easily impressed or have a very specific need. Sigh...I'm already sounding like a weather beaten veteran who's seen too many "flash in the pans".

If you didn't know and are wondering what the actual JVM team has been doing in response to the demand from these new JVM-based languages in Java 7, here's a hint (JSR 292, MethodHandles, InvokeDynamic ..). Be warned, this is not for the faint of heart.

Enough of that. Now on to BloomFilters, a wonderful data structure heavily used by all NoSQL systems like Hive, Cassandra etc. Here's a very nice visualization.

I've written about Neo4J the graph database before. I really like the concept but I haven't found an excuse yet (partially because it's LGPL) to use it. They even have a nicely done graph access language, sort of like SQL for graphs called Cypher. Here's a nice demo of Neo4J, Gephi and spatial data. Makes sense for desktop use but I don't know if they have many server/enterprise grade users with many concurrent transactions.

If you are looking for some general tech reading, like while on a place, here's a nice set of essays (also a book) - The Architecture of Open Source Applications.

Speaking of Open Source, I had not read or heard about The Unix Philosophy. I like the simplicity and focus. Obviously it has had great success but it applies to other software too. Sort of like Deiter Rams ideas on design but for software.

If you've made it till here and are dying for some hard core tech, then here's some on CAS concurrency, Azul's proposed Linux kernel patch and thread affinity.

These days, any blog is incomplete without mention of BigData. So, here's some BigData reading material - slides from the Cassandra 2011 summit.

Ashwin.