Saturday, December 03, 2011

Some interesting talks from Camelone, Eurocon, HadoopWorld, distributed joins and more

Apache Camel, if you haven't used or even heard about this lovely little integration framework then I highly recommend watching this video by one of its founders. It's best described as a mini-ESB, an implementation of the Enterprise Application Integration Patterns (EAI) with API/programmatic access, a zillion connectors and a delight to use.

Also worth watching is the EAI intro video by Gregor Hohpe at Camelone 2011.

Some interesting HadoopWorld 2011 slides:

 Apache Eurocon 2011 slides, worth reading - Lucene at Twitter.
Joins - the Achilles heel of NoSql, distributed systems. Possible, but difficult and not always feasible. Yet, several solutions are available being offered in addition to the big data warehouse vendors and do-it-yourselfers.
More lessons on IO - Disk I/O abstraction layer X-rays.

You thought System.nanoTime() was fast than using System.currentTimeMills() for profiling? Well.. it's all relative.

The old JCache API seems to have been revived - JSR107: The new Caching Standard. It's also going to be a part of the next JEE spec. More signs of commoditization - a good sign of progress.

Until next time!

Monday, October 31, 2011

Garbage collection, memory, IO and other scary stories

Happy Halloween! Want to read some scary, low-level, systems related stuff? Here's a short list of very useful articles I read recently:

SQL anti-patterns.

GC horror stories in the .Net world.

Still not scared? How about some math and statistics:

Saturday, September 24, 2011

Hiking in Fremont Older Open Space Preserve

So close to South Bay Area. Prospect Road trail is a little steep but short. Worth visiting again because there are so many trails.

Tuesday, September 13, 2011

Monday = one of the days of the week when you forget if you had coffee and have to look in the trash can for the discarded coffee cup, just to check.

Offloading data from the JVM heap (a little experiment)

Last time, I wrote about the possibility of using Linux shared memory to offload cacheable/reference data from the JVM. To that end I wrote a small Java program to see if it was practical. The results were better (some even stranger) than I had expected.

Here's what the test program does:

  • Create a bunch of java.nio.ByteBuffers that add up to 96MB of storage
  • Write ints starting from the first buffer, all the way to the last one - that's writing a total of 96MB of some contrived data
  • For each test, the buffer creation, writing and deletion is done 24 times (JIT warm up)
  • For each such test iteration, measure the memory (roughly) used in the JVM heap, the time taken to create those buffers and the time taken to write 96MB of data
  • Obviously, there are things here that sound fishy to you - like why use ByteBuffers instead of just writing to an OutputStream or why write to the buffers in sequence. Well, my intentions were just to get a ballpark figure as to the performance and the viability of moving data off the JVM heap
About the test:
  • There are really 5 different ways to create the buffers. Then there are 2 variations of these tests in which the buffer sizes vary (blocks), but the total bytes written are the same
  • The buffers (blocks) for each variation are created as:
    • Ordinary HeapByteBuffers inside the JVM heap itself - as a baseline for performance
    • DirectByteBuffers
    • A file created on Ext4fs using RandomAccessFile and parts of the file are memory mapped using the FileChannel. The file is opened in "rw" mode. Other options are "rwd" and "rws"
    • The same as above but the file resides in /dev/shm the in-memory based, shared memory virtual file system (Tmpfs)
    • The buffers are created using Apache's Tomcat Native Libraries which in turn use Apache Portable Runtime libraries. The Shared memory (Shm) feature was used to create the buffers. This is similar to DirectByteBuffers but the buffers reside in a common area, in OS memory and not owned by any but shared between processes (Similar to /dev/shm but without the filesystem wrapper overhead)
  • The machine used to test was my moderately powered Windows 7 home laptop with 8GB RAM, 2.3GHz i5 running a Cloudera Ubuntu Linux VMWare Player. There were a few other processes running, but nothing that was using CPU extensively. 500MB+ memory was free and available
  • The VM had 1GB RAM and the JVM heap was 256MB
  • The test program was run once for each configuration, but each test itself ran 24 times to allow the JIT to warmup and even the file system caches to stay warm where needed
  • The test prints out the timings with headers which were then compiled into a single text file and then analyzed in RStudio


  block_size                      test_type perctile95_buffer_create_and_work_time_millis perctile95_mem_bytes
1       4096                         direct                                       1555.65              3047456
2       4096 file_/dev/shm/mmap_test.dat_rw                                        661.70              3047632
3       4096          file_mmap_test.dat_rw                                       2055.75              3047632
4       4096                           heap                                       1071.15            102334496
5    4194304                         direct                                        653.85                 3008
6    4194304 file_/dev/shm/mmap_test.dat_rw                                        561.40                 3184
7    4194304          file_mmap_test.dat_rw                                       3878.25                 3184
8    4194304                           heap                                       1064.80            100664960
9    4194304                            shm                                        678.40                 2496

Interpretation of the results:
  • The test where block size was 4KB had quite a lot of memory overhead for the non-Java-heap ByteBuffers. Memory mapping was also slow for these small sizes as the Javadocs itself says for
  • The JVM heap test was slower than I had expected (for larger ByteBuffers). I was expecting that to be to fastest. Perhaps it was the small memory (1GB) virtualized OS it was running in. For smaller block sizes approaching 1K or less, the JVM heap performance is unbeatable. But in these tests, the focus was on larger block sizes
  • The Apache shared memory test would not even start for the 4KB tests as it would complain about "not enough space"
  • Almost everything fared well in the larger 4MB test. The per-block overhead was less for the off-heap tests and also the performance was nearly identical for /dev/shm, Apache Shm and DirectByteBuffer
  • The sources for this test are available here.
  • To run all the tests except Apache Shm you only need to compile JavaBufferTest and run it with the correct parameters
  • To run all tests, you can use the sub-class AprBufferTest which can test Apache Shm and also the remaining tests. To compile this you'll need tomcat-coyote.jar from apache-tomcat-7.0+. To run this you'll need the Jar file and the Tomcat Native bindings - tcnative.dll or libtcnative for Linux
There are advantages to using ByteBuffer outside the Java heap:
  • The mapped file or the shared memory segments can outlive the JVM's life span. Another process can come in and attach to it and read the data
  • Reduces the GC pressure
  • Zero-copy transfers are possible to another file, network or device using FileChannel.transferTo()
  • Several projects and products have used this approach to host large volumes of data in the JVM
  • The data has to be stored, read and written using primitives to the ByteBuffer - putInt(), putFloat(), putChar() etc
  • Java objects cannot be read/written like in a simple Java program. Everything has to be serialized and deserialized back and forth from the buffers. This adds to latency and also makes it less user friendly
Misc notes and references:
  • These tests can also be run on Windows, except for the Linux Tmpfs tests. However a RAM Drive can be used to achieve something similar


Tuesday, September 06, 2011

Blue Lakes and Clear Lake (CA's biggest natural lake)

Clear Lake in Kelseyville is a nice and peaceful place to visit and spend a weekend. Unlike Tahoe this feels cozier, less commercialized and is close to Napa. You can rent lake front property/spacious homes, relax and unwind. It is also the largest natural freshwater lake entirely in California (Tahoe is not because it spreads into Nevada).

Blue Lakes is a smaller lake about 45 minutes from Clear Lake (Soda Bay) and is a perfect spot to float around on the lake and spend a warm afternoon with family and friends.

View Trip/05Sep2011 in a larger map


Thursday, September 01, 2011

RAM disk is already in Linux and nobody told you (a.k.a Shared memory for ordinary folk)

The Linux 2.6 Kernel these days comes with an in memory file system. This is really shared memory across processes ("Everything old is new again!").

The beauty of the Linux implementation (like everything else) is that this shared memory looks like a regular file system - /dev/shm (but is really Tmpfs). So, your application - yes even a Java application can use this shared memory system across processes as if it were just writing to a regular file. You can create directories, setup a memory quota for this file system, tail files, change to the directories etc like any other directory, open-write-close files that can be read by other processes. Convenient!

RAM disks are not a new concept. But a RAM disk driver that is built into the kernel adds a totally different level of credibility and trust to the concept.

All the contents of this directory are in memory. Nothing gets written to the disk, no flush, no IO wait times. Naturally, this being in-memory you will loose all the contents when your OS reboots. What possible use can this be to us, I hear you ask? Well well.. where do I begin:

  • Memory is cheap. 96GB RAM is quite common these days on server class machines
  • You can run a big RDBMS on this file system. Typically these databases are anyway replicated/clustered over the network
    • So you can run the entire DB in-memory and still have HA because the replica is running on another machine anyway (low "disk" latency + high TPS + HA)
    • Why write to a local disk which can crash anytime
    • Why spend so much on expensive SSDs
    • 10GigE is already here
  • Run medium sized JVMs and push all the heavy data to this shared memory filesystem
    • You free up the heap to do simple in-JVM caching and reduce the pressure on GC by moving all the data to /dev/shm
    • If your JVM crashes, another JVM can be notified to pick up that data since it is just stored as a bunch of files and directories
  • People used to do IPC all the time using old fashioned shared memory constructs but it fell out of favor because networks and network drivers became quite fast
    • Also moving away from IPC to TCP-over-localhost gave you freedom to spread your processes across machines (TCP-over-GigE)
    • Perhaps it is now worthwhile to re-examine that approach and shave precious milliseconds by not copying data back and forth between processes over sockets
To that end I wrote a simple Java class to measure the performance of writing to file on /dev/shm compared to a regular disk based file on /tmp. The results are interesting and I'm hoping will make you readers re-think about your software systems ("Everything old is new ...").

The program is a simple Java file writer that forces a flush every few tens of bytes (which means nothing when writing to /dev/shm). The destination file and its path can be specified as a command line argument. So, it's easy to compare performance. I ran these tests on a Cloudera-Ubuntu VMWare image running on my 64-bit Windows 7 laptop with 4GB RAM and 2.3 GHz duo core Intel i5. In-memory is not surprisingly 7x faster for a 112KB file. Also, laptop was running on batteries means processor speed steps down to save power.

Why people do not talk about this relatively new Linux feature out loud is perplexing.

Other interesting commands you can run against this file system:
  • mkdir
  • cat
  • tail, grep
  • ipcs -a
  • df -k
Follow up articles:
Detailed log:

Until next time,

Sunday, August 28, 2011

Guicing up your app with Google Guice

I've created a simple but not completely trivial demo application using Guice. This is something I did to understand and compare Google-Guice with JBoss Weld's CDI RI for Java SE.

This project is a pure Guice project called Guice-Demo (duh) . Here's the source. Well, that's all there is. I thought this would be of some use to other first time users. The project wiki/docs are minimal and I might add some info later on, when I have the time. Until then I hope that the code is self explanatory.

If you are interested in the Java CDI standard, then this is useful. Weld also works on JavaSE but has no control over classpath scanning for injection and makes starts up times slower and somewhat wasteful.

Guice's module creation is very straightforward. It also allows creating multiple Injector instances.

Both Weld and Guice are not very unit test friendly as you will see here.


Saturday, August 27, 2011

Hiking in Stevens Creek County Park

Accessible and a relaxing walk around the reservoir that can easily convert to a longer hike. It's surprising that I had never been to this place, less than 15 minutes away. Avoid the main entrance on Balboa road which requires a fee to park. Try the West campus entrance where street parking is allowed (looked like).

There are 2 adjacent parks that are probably worth visiting as well.

Saturday, August 13, 2011

In-memory DB tables, shared nothing, time series, low latency and others

Some useful articles I read in the past few weeks.

Big data, shared nothing, distributed SQL, in-memory SQL:

New age time series databases:
Very good and detailed accounts of low latency Java based servers:
Disk I/O and time related notes.

Programming and Java:

Until next time,

Monday, August 01, 2011

Java 7's j.u.c.Phaser - a short tutorial

If you've tried to figure out how the Phaser works in Java 7 and haven't had much luck, then you are not alone. I too had some difficulty understanding what looks like a very esoteric concurrency construct. I asked around on the Concurrency-Interest forum. Folks there are very helpful and I almost understood it but was missing some details.

So, I decided to get my hands dirty and play with this new toy. Phaser (as the JavaDocs say) is very much like a CountDownLatch or a CyclicBarrier but is better suited where:

  1. Parallel operations need to proceed in lockstep
  2. After every step, all parallel operations wait until all others have completed
  3. When they do, all proceed to the next step and so on...
Sounds easy, but understanding the API was not easy for me. So, I had to write a simple program to see how it worked. Here it is:

What the program does is:
  1. Create a Phaser instance and set it up to expect 2 parties and the main thread/itself as a third party
  2. Start 2 producer threads
  3. Start a consumer thread
  4. Unregister itself as the third party and let the 2 producers run in parallel - as 2 remaining parties
  5. In the mean while the 2 producer threads pretend to do some work and move forward in phases
    1. There are a total of 10 phases
    2. Each producer writes it results into an array where the array position matches the phase number
    3. I call this array per producer - a lane (like a highway)
  6. While the producers are running, the consumer tries to catch up with them
    1. It waits for the phase to complete and then reads the results of that phase
    2. The consumer is slower and over time it trails behind the producers
    3. When it comes back after pretending to do some processing the producers would've moved - sometimes 1 or even 2 phases ahead
    4. The consumer now reads all the phases that have completed so far and then catches up
  7. This consumer looks like it is reading "read-committed" transactions - doesn't it? 

This picture (hopefully) explains more clearly, what is happening in the code:

(Remember - the sequence diagram and log shown here are specific to my computer. On slower or faster machines you might see slightly different results - for the consumer, at least in the beginning phases.)

The log output:

Until next time!

Sunday, July 31, 2011

Java 7 (the good bits)

Finally, it's here!

Like you, I've been waiting for this release for almost 3 years. Sometimes wondering what features would really go into version 7... sometimes wondering if there would even be a 7 (after the acquisition).

Here's the smallest set of links that you'd need, to understand the new features. Of course there are many articles and blogs on the internet with more detailed information:


Thursday, July 28, 2011

Review of EJB 3.1 Cookbook

I just finished reading the EJB 3.1 Cookbook by Richard M. Reese. I haven't used EJBs in almost 5 years and since the JEE spec hadn't undergone any major (interesting) changes I had ignored it....until I heard about the newer and more simplified JEE 6/EJB 3 spec - the one that uses Java annotations. To me this attempt to simplify the standard seems like a very good thing in that the standards body finally started paying attention to the many accusations hurled (rightfully so) upon it my Spring, Ruby/Rails, Scala and other camps.

So, I was curious to see what the new spec looked like and coincidentally I was given a chance to review this EJB 3.1 Cookbook by Packt Publishing. The book being a cookbook/recipe book, it does not go into the details of why you should use JEE/EJB or how a beginner should get started and other such basics. It assumes that you are already familiar with programming, Java, JEE and specifically EJBs.

I must say that the new spec looks lighter and so much simpler than it did a few years ago.

The book is packed (no pun on the publisher's name) with small and useful recipes focusing on each new aspect of EJB 3.1:

  • Dependency Injection (CDI)
  • Stateful, stateless and singleton EJB annotations
  • Persistent EJB annotations
  • MDB annotations
  • Timers and schedulers
  • Concurrency
  • Startup sequence, named and dependent EJBs
  • Soap and restful web services
  • Security
  • Interceptors (AOP)

It is a good book to help you get up to speed on the latest spec. You might still, occasionally need to look up the detailed spec or Google or some other forum for specifics. Overall it is a good, almost vendor agnostic (Oracle Glassfish) and easily digestable read. It does lack some depth in a few places but then if you need more details, then you should read the actual spec.

As an aside, I feel all publishers should reduce the prices of their ebook/PDF versions.

Until next time,

Wednesday, June 29, 2011

60F and raining... somebody forgot to switch on Summer.

Saturday, June 25, 2011

The Five Ws (and H) approach

A few days ago, I was trying to explain some technical concepts to a friend. After a bit of explaining I tried listing out reasons, places and times where those concepts would be applicable. I still thought that I had left out something. I then spent some time searching online for ways to - learn systematically and teach correctly by at least outlining the essentials of the problem and encouraging the learner to follow up in his/her own time.

Did I find anything? Of course I did. In fact I discovered too much information but found only some to be simple and interesting. My favorite is what you might think is obvious - The Five Ws approach. Also popular in problem solving is the Five Whys.

Apart from it being obvious, the Five Ws helps you to break down the problem into:

  • When
  • Where
  • What
  • Who
  • Why
  • How
The easiest way to teach this to kids is to use each finger on one hand to stand for a "W". The H is extra.

Now, why is this simple technique relevant? 
    It is:
        Easy to remember
        Easy to explain

    It serves as a starter guide to:
        Formulate the right questions
        Break down the problem
        Cover/analyze all aspects of the problem - like why & why not, what & what not

    It also helps us:
        Remember better by understanding, instead of memorizing
        Identify the problem by looking for signs (5 Ws)

    It works quite nicely as:
        A way to exchange ideas (5 Ws = 5 aspects)

        A way to encourage people (even kids) to think deeper- (Systems thinking)
            By supplying the first 5 questions when stumped
            Progress to other approaches 

        A template to share and disseminate knowledge (5 Ws = 5 steps)
            Like Design Patterns and Anti-patterns for software design
            Simple reproducible steps for QA/Support/Services/junior members  etc.

Here's a simple pictorial way to help you get started. I drew it for myself initially. It is built like a form where you can fill in the blanks, on the right hand side. I encourage you to print it out and use it in meetings too or even to teach your kids.

The 5 Ws and H extended

Until next time!

Saturday, June 18, 2011

Sharding, distributed object graphs and other reading material

[Minor update: June 20, 2011]

Some interesting notes on sharding/data partitioning using SQL:

  1. Database sharding at netlog with MySql and PHP
  2. Distributed Set Processing with Shard-Query
  3. Data partitioning - scaling database
  4. Some previously described here

But, these are more interesting - notes on designing a complex, distributed, object graph:
  1. Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability
  2. Avatara - LinkedIn's home grown OLAP
  3. Some previously described here 

I came across ActiveJDBC (yeah yeah.. I know what you are thinking.. another ORM) but this is simple and only aims to solve the simple (80%) cases. Very unobtrusive.

Some IO related system internals:

A detour:
Until next time!

Thursday, May 19, 2011

Some tools you should not leave home without

In no specific order:

  • After years of searching for hierarchical to-do list managers, frustrating outline and OPML editors, buggy project trackers I found this - ToDoList. It's clean, free (as in speech) and surprisingly, has really nice features
  • yWork's brilliant diagram editor - Graphity or it's thick client cousin - yEd 
  • XMind for mind mapping and brainstorming
  • A decent screen capture and annotation tool - Screenpresso
  • Notepad++, naturally

Sunday, May 08, 2011

Hiking up Mission Peak

The best time of the year to go there - Spring. Preferably, to watch the sun rise or sun set.

View of Bay Area from Mission Peak - yes that's where we started

Mission Peak - view of our Bay Area

Mission Peak

Finally! At the peak after a 2 hr uphill hike

Thursday, April 28, 2011

DSLs, OODBs, system internals and some Java stuff


  • Groovy is perhaps the best language that lends itself for this purpose. Here's a nice (longish) introduction - Groovy.DSLs (from: beginner, to: expert) 
  • A simpler pseudo DSL based on regex - Jbehave, primarily used for Behaviour-driven development in Java
A good article by (The) Martin Fowler on how to develop quality code - for beginners.

A few good articles I came across recently on operating systems, disks, networking:
OODBs - NoSql's grandpa and the forgotten cousin of SQL. Haven't even heard of OODBs? Here's a 4 year old article but still worth reading.

Java tidbits:

Wednesday, April 20, 2011

You know you are growing old(er) when you catch yourself listening to the news more often than music, while driving.

Thursday, April 07, 2011

The little gem that is BusyBox (for Windows)

As a Windows user (no shame) I have, for years searched for a simple GNU-like toolkit - grep, awk, tail and other such goodies enjoyed by Linux users. Yes, there's Cygwin but it's a beast - too big and a pain to install. Sometimes, I've even resorted to starting a Linux VMWare image just to run a simple awk script to munge some log files.

But I found BusyBox and there's a compact 600KB BusyBox exe for Windows! What does it have? Well.. what does it not have?! It has all the essentials:

[, [[, ar, ash, awk, base64, basename, bash, bbconfig, bunzip2, bzcat,
bzip2, cal, cat, catv, cksum, cmp, comm, cp, cpio, cut, date, dc, dd,
diff, dirname, dos2unix, echo, ed, egrep, env, expand, expr, false,
fgrep, find, fold, getopt, grep, gunzip, gzip, hd, head, hexdump, kill,
killall, length, ls, lzcat, lzma, lzop, lzopcat, md5sum, mkdir, mv, od,
pgrep, pidof, printenv, printf, ps, pwd, rm, rmdir, rpm2cpio, sed, seq,
sh, sha1sum, sha256sum, sha512sum, sleep, sort, split, strings, sum,
tac, tail, tar, tee, test, touch, tr, true, uncompress, unexpand, uniq,
unix2dos, unlzma, unlzop, unxz, unzip, usleep, uudecode, uuencode, vi,
wc, wget, which, whoami, xargs, xz, xzcat, yes, zcat
This is where I'd use it the most - to compress and summarize JVM thread dumps for quick analysis. It's based on JPMP, but here's my version of it. First, download BusyBox for Windows and then use the simple script below to munge your JVM thread dump.

This is the first part of the script:
This is the awk pattern that is used by the script above.
All you have to do now is run it against your thread dump file:
jpmp.bat ..\jstack.out > ..\jstack.out.log

And it converts a huge file like this...
2011-04-07 20:44:06
Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.2-b04 mixed mode):

"TimerQueue" daemon prio=6 tid=0x000000004ec33000 nid=0x1268 in Object.wait() [0x000000004a3cf000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x0000000034004608> (a javax.swing.TimerQueue)
 - locked <0x0000000034004608> (a javax.swing.TimerQueue)

"D3D Screen Updater" daemon prio=8 tid=0x000000004af1e000 nid=0x186c in Object.wait() [0x000000004f40f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x0000000034e16ba8> (a java.lang.Object)
"Reference Handler" daemon prio=10 tid=0x0000000000823800 nid=0x18e4 in Object.wait() [0x000000004940f000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x0000000034dda018> (a java.lang.ref.Reference$Lock)
 at java.lang.Object.wait(
 at java.lang.ref.Reference$
 - locked <0x0000000034dda018> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x0000000000820000 nid=0x14c4 runnable 

"GC task thread#0 (ParallelGC)" prio=6 tid=0x0000000000777800 nid=0x173c runnable 

"GC task thread#1 (ParallelGC)" prio=6 tid=0x0000000000779800 nid=0x4b8 runnable 

"GC task thread#2 (ParallelGC)" prio=6 tid=0x000000000077b000 nid=0x15d0 runnable 

"GC task thread#3 (ParallelGC)" prio=6 tid=0x000000000077c800 nid=0x199c runnable 

"VM Periodic Task Thread" prio=10 tid=0x0000000049580800 nid=0x1984 waiting on condition 

JNI global references: 1355

Into a tidy summary like this. Very useful if you have thread dumps from 20 servers taken every 30 seconds.
All this without leaving the comfort of your Windows system!
2 j.lang.Object.wait,,,sun.nio.cs.StreamDecoder.readBytes,sun.nio.cs.StreamDecoder.implRead,,,,,,$
1 j.lang.Object.wait,,
1 j.lang.Object.wait,,
1 j.lang.Object.wait,j.lang.ref.ReferenceQueue.remove,j.lang.ref.ReferenceQueue.remove,,
1 j.lang.Object.wait,j.lang.ref.ReferenceQueue.remove,j.lang.ref.ReferenceQueue.remove,j.lang.ref.Finalizer$
1 j.lang.Object.wait,j.lang.Object.wait,,
1 j.lang.Object.wait,j.lang.Object.wait,j.lang.ref.Reference$
1 j.lang.Object.wait,j.lang.Object.wait,j.awt.EventQueue.getNextEvent,j.awt.EventDispatchThread.pumpOneEventForFilters,j.awt.EventDispatchThread.pumpEventsForFilter,j.awt.EventDispatchThread.pumpEventsForHierarchy,j.awt.EventDispatchThread.pumpEvents,j.awt.EventDispatchThread.pumpEvents,

Until next time!

Wednesday, March 23, 2011

Dabawalas and other books I read recently

Some interesting books I read in the past few months. Needless to say they were mostly, all Sci-Fi:

Until next time!

What's missing in Google Search

Let's face it, most of the world relies on Google Search to find information on the web. I do. In this day and age of Facebook, Twitter, LinkedIn, social bookmarking sites and other "social applications", Google Search is still years behind.

Some basic features that are sorely missing from their flagship product:

  1. Linking to a search result. Even Google Maps has this feature, but Search does not
  2. Creating a bundle of links like Bitly from some selected search results
Why? Because it makes it easier to share research with colleagues and friends. It also helps to bookmark only interesting links when you are looking for some information and not have to wade through tons of results.

I've suggested this feature to Google a couple of times, but it seems to have fallen on deaf ears. I hope at least the Bing team is listening.

This is what I'd really love. To start with, at least:
(If you were wondering, I used Balsamiq to create the mock-up)

Monday, March 21, 2011

Entropy is the greatest enemy of life, ideas, memory and hope.

Thursday, February 17, 2011

Monterey bay

Monterey bay, originally uploaded by ashwin.jayaprakash.

Monterey bay sunrise

Monterey bay sunrise, originally uploaded by ashwin.jayaprakash.

Monterey bay

Monterey bay, originally uploaded by ashwin.jayaprakash.

Wednesday, February 02, 2011

Going native, yes to SQL and other stories

Some interesting performance related articles and links I've come across in the past few weeks:

A useful thread discussing Voldemort's performance compared to Cassandra. Even more interesting to note is that LinkedIn runs 18G heaps for Voldemort+BDB.

HBase guys doing some work around the JVM GC, reminiscent of arenas and slab allocators.

Native code:
Java Fast Sockets - a research project around alternative, high performance socket implementations for Java and its spin-off company. The company site still looks like it's in stealth mode. A downloadable version would've been nice. A "free" version would've been nicer.

If you thought Java Classloading was tricky and JVM startup times are better these days - look at what the Firefox and Chrome guys have been up to - bypassing normal DLL loading and pre-fetching pages from disk:

Another interesting read about how debuggers work internally. 

Yes to SQL:
Some libraries and notes for SQL sharding and replication:
Some interesting SQL tips and tools:
Until next time!

Tuesday, January 25, 2011

Ray Mears

If you've never heard of Ray Mears and his multiple BBC series on bushcraft, woodlands survival and generally enjoying time out in the woods, then it's time you watched some of his videos. This is unlike other "TV presenter vs nature" shows you commonly see on TV. Ray Mears shows how you to work "with" nature in his relaxed and pleasant presentation style. I would place Ray Mears in the same class as top BBC naturalists/historians/travelers as Michael Wood, David Attenborough, Michael Palin to name a few.


Friday, January 21, 2011

Shades Of Gray (Another fix for online reading)

(Mis)Quoting lines from Billy Joel's song "Shades of grey":

Shades of grey wherever I go
.. .
Black and white is [not] how it should be
But shades of grey are the colors I see
It has occurred to me (after being disappointed with the tiny 6" screen of the Kindle) that we spend a lot of time (those of us who do) on the computer staring at bright and mostly white computer screens. We've become accustomed to the brightness of the white backgrounds. I find it quite stressful to read against a white background for long hours.

After spending a few hours searching (alas) online, I found some tricks that have made my computer time even more pleasant.

1) Fix for PDFs - a poor man's ebook reader:
Change the background of all PDFs you read by changing your preferences in Adobe Acrobat Reader. Go to the toolbar "Edit > Preferences > Accessibility > Document color options". Check the box that says "Replace Document Colors" and choose a gray background as shown in the screenshot.

Note the background color - 192-192-192, the magic combination.

And this is what it will look like. You can view it in full screen mode and almost convince yourself that you are reading an ebook.

2) Fix for IntelliJ:
Something similar can be done for IntelliJ. Import this JAR file with color settings for IntelliJ 10 using "File > Import settings".

Who said "gray areas" are bad? Enjoy!

Wednesday, January 19, 2011

Making your online reading even more pleasant

What better way to read a whole bunch of articles that you've been saving for later than to "print" it all and read?

I'm a heavy user of Readable and Readability, but nothing beats printing everything into a PDF, logging off and then reading it comfortably on your ebook reader (if you have one, which I don't. Yet). Well..paper if you really have to.

After a lot of searching I found this wonderful addon for Firefox which lets you print all your webpages, html or text files on disk from the command-line. With this tool you can even write a batch file with all your links and print them into PDFs from the command line (via: MozillaWiki).

Once you've generated all your PDFs, you can even merge them all into a giant PDF using this.

And you are all set!

Saturday, January 08, 2011

So many products... so many choices

Something akin to the Efficient-market hypothesis or the Informationally Efficient Market (the opposite of) that might explain why the market can sustain an ecosystem of products and solutions - even sub-standard ones.

[Update: May 1, 2011] Porter's Five Forces model is a very accurate analysis of this problem.

Hiking in Russian Ridge OSP

Another year and a nice hike to begin it. I've hiked here a few times before and I still love it.

This time I got to see a coyote waiting over the ridge and looking down at passing hikers and waiting to pounce on a swarm of little birds that were feeding on seeds on the hillside:
(Shot from a camera phone. Bad quality)