Saturday, March 03, 2012

Redis (and Jedis) - delightfully simple and focused NoSQL

Redis is an open source NoSQL project that I had not paid much attention to. Largely because it didn't seem very special at the time nor did it have a good persistence and paging story. Also, there is/was so much noise out there and the loudest among them being Memcached, Hadoop, Cassandra, Voldemort, Riak, MongoDB etc that it slipped my mind.

Last weekend I thought I'd give Redis another try. This time I just wanted to see Redis for what it is and not compare it with other solutions. So, as it says on the site:

Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
Seemed interesting enough to warrant another look. There are so many projects that need:
  • Simple, fast, light
  • In-memory (with optional checkpointing)
  • Fault tolerant / Sharded / Distributed
  • Shared access from many processes and machines
  • Some real data structures instead of just wimpy key-value
  • Flexible storage format - without needing crummy layers to hide/overcome limitations
  • Clean Java API
So, I downloaded the Windows port of Redis and Jedis JAR for the Java API.
  1. Unzip Redis Windows zip file 
  2. Copy the Jedis JAR file
  3. Go to the 64bit or 32bit folder and start "redis-server.exe"
  4. Write a simple Java program that uses Jedis to talk to the Redis server
  5. That's it
The size of the files, elegance and the simplicity reflect the fact that Redis and Jedis are written by individual, smart developers. With contributions of course but still...

This is what the program looks like:

This is what the output looks like:

Redis offers many of the things you'd have to look for in separate products like cache, expiry, key-value, flexible column-family store, load and data distribution, fault tolerance and more. You can also experiment with transactions, pub-sub, sharding, master-slave, persistence etc.

It cannot do super heavy data volumes (like multi-terabyte) as it's disk persistence is quite primitive, lacks complex querying and indexing features (which is a common limitation in NoSQL), multi-threading, more advanced/robust/consistent sharding/replication/distribution/expansion.

Having said that we as architects sometimes over engineer things or try to look for tools that can do everything well, which really ends up as "everything mediocre".

Some useful links:

Until next time!