Ashwin Jayaprakash's Blog: 2013

Sunday, December 15, 2013

Java/tech stuff I found on the internet (Dec 2013 edition)

Networking and big data:

Java/JVM perf:

Java memory model + arrays + visibility/ordering:

Curios:

Good ElasticSearch + Logstash videos:

Happy holidays!

Sunday, November 24, 2013

Analyzing large Java heap dumps when Eclipse Memory Analyzer (MAT) UI fails

If you find yourself trying to analyze a big heap dump (20-30GB) downloaded from your production server to your staging/test machines.. only to find out that X-over-SSH is too slow then this article is for you.

As of Nov 2013, we have 2 options - Eclipse MAT and a hidden gem called Bheapsampler.

Option 1:
Eclipse Memory Analyzer is obviously the best tool for this job. However, trying to get the UI to run remotely is very painful. Launching Eclipse and updating the UI is an extra load on the JVM that is already busy analyzing a 30G heap dump. Fortunately, there is a script that comes with MAT to parse the the heap dump and generate HTML reports without ever having to launch Eclipse! It's just that the command line option is not well advertised.

Command line heap analysis using Eclipse MAT:

Assuming Eclipse MAT is installed and we are inside the mat/ directory, modify MemoryAnalyzer.ini heap settings to use a large heap to handle large dumps:

    -startup
    plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar
    --launcher.library
    plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.100.v20110505
    -vmargs
    -Xms24g
    -Xmx24g

Run MAT against the heap dump:

    ./ParseHeapDump.sh ../today_heap_dump/jvm.hprof

This takes a while to execute and generates indices and other files to make repeated analysis faster. Then use the indices created in the previous step and run a "Leak suspects" report on the heap dump.

    ./ParseHeapDump.sh ../today_heap_dump/jvm.hprof org.eclipse.mat.api:suspects

The output is a small and easy to download jvm_Leak_Suspects.zip. This has HTML files just like the MAT Eclipse UI. It can be easily SCP'ed/emailed around.

Other report types possible.

    org.eclipse.mat.api:suspects
    org.eclipse.mat.api:overview
    org.eclipse.mat.api:top_components

More details - http://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ.

Option 2:
http://dr-brenschede.de/bheapsampler is something I chanced upon. It is a sampling heap dump reader and so it works for very large heap dumps where MAT sometimes fails. Being a sampling reader, the output is also a little imprecise but helps a great deal when you have nothing else. The tool seems to be closed source and is very sensitive to heap dump corruptions.

As an aside, here's something that might be useful for the initial heap dump quickly - https://blogs.atlassian.com/2013/03/so-you-want-your-jvms-heap/.

Sunday, November 17, 2013

Book review: Getting Started with Hazelcast

A few weeks ago Packt Publishing sent me a free copy of their new publication - Getting Started with Hazelcast by Mat Johns to read and write about. I have used distributed caches and compute grids quite a bit at work. So, I was happy to do a quick review of this book. I've used Oracle Coherence quite a lot and Hazelcast for some experiments.

The book is a gentle guide to building distributed compute and data grids. It assumes nothing about the reader and hence does a good job of doing what it says in the book's title - "getting started". I'd advice this book for anyone who is completely new to this area which is not to be confused with Hadoop, Storm, Cassandra or the other more "popular/hyped" cousins. I would say that for medium sized data, logic heavy, transactional/near real time applications, compute grids are the way to scale out.

Obviously this book is about using Hazelcast, which is a nice Apache software licensed, Java, distributed grid/cache. It is surprisingly feature rich and in terms of usability, features and elegance it comes very close to its more expensive, older, rock solid cousin which is Oracle Coherence.

The book explores the essential aspects of using such frameworks effectively. Such as - distributed maps, replication, network partitions, fault tolerance, data affinity, moving code closer to where data is etc. It does this without being too overwhelming for first timers.

For a full and more thorough treatment I would obviously recommend the Hazelcast documentation. And if you are curious to know about other frameworks check out my old write up - Scalable compute & storage frameworks - A Refcard.

Ashwin.

Friday, October 11, 2013

JVM memory management speed, performance related stuff and other links

Here's this season's link fest. Let's start with Java:

JVM memory management slower than C?
Latest Java Puzzlers
http://www.semicomplete.com/blog/geekery/debugging-java-performance.html
http://www.metaltoad.com/blog/plotting-your-load-test-jmeter
TCP-UDP bandwidth sharing (Coherence)
bheapsampler - A very capable heap dump analyzer that uses sampling. Good for large heaps
An alternate way to take heap dumps using GDB
Lambdas before Java 8
Java concurrency bug related to system date (!)

Some JavaOne related posts:

Other cool algos and stuff:

Until next time!

Sunday, September 08, 2013

Fort Bragg, Point Cabrillo lighthouse, Mendocino county trip

Fort Bragg's Glass Beach is (well.. how shall I put it) completely skippable. The Botanical garden is totally worth the visit.

Until next time!

Java, GPU, interesting JVMLS talks, graphs, timing wheel etc.

Here's another dump of interesting/useful tech stuff I read over the past couple of months.

(Notice how I'm getting lazier over time? I'm just dumping links and not even adding my notes or cleaning them up... just raw links)

http://www.infoq.com/articles/low-latency-vp
http://www.cliffc.org/blog/2013/08/15/tcp-is-unreliable/
http://www.networkworld.com/news/tech/2013/062413-hadoop-gpu-271194.html
http://blog.paradoxica.net/post/55585594323/hacking-openjdk-part-2-adding-custom-instrumentation
http://brooker.co.za/blog/2012/11/13/increment.html
http://webtide.intalio.com/2012/12/avoiding-parallel-slowdown-in-jetty-9/
http://blog.jooq.org/2013/08/20/10-subtle-best-practices-when-coding-java/

JVMLS 2013:
OpenJDK at Google
Project Sumatra (GPU)
JVM Benchmarking
Java Native Runtime
Graal and GPU Offload

Coherence rack safe:

http://coherencedownunder.wordpress.com/2012/06/07/making-your-cluster-site-or-rack-safe-with-coherence-3-7-1/
https://blogs.oracle.com/felcey/entry/testing_the_coherence_simple_assignment

Redis - http://tech.3scale.net/2012/07/25/fun-with-redis-replication/

Search and graphs:
http://karussell.wordpress.com/2013/07/22/graphhopper-maps-high-performance-and-customizable-routing-in-java/
http://www.slideshare.net/ZacharyTong/boston-meetupgoingorganic

Cassandra:
Large Queries in Real-Time for Enterprise
http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks

Timing wheel:
http://stackoverflow.com/questions/14250192/how-many-timeouts-can-a-netty-hashedwheeltimer-handle
http://netty.io/4.0/api/io/netty/util/HashedWheelTimer.html
http://www.cubrid.org/blog/dev-platform/more-efficient-timer-implementation-using-timerwheel/

Floating point...yuk:
https://github.com/spencertipping/flotsam
http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
http://www.apfloat.org/apfloat_java/
http://code.google.com/p/guava-libraries/wiki/MathExplained

Saturday, August 10, 2013

Hiking around Stevens Creek Reservoir

We've been to this place quite a few times. I like the peace and quiet here. It's close to Mountain View, like San Antonio Rancho.

There are multiple trails here. Our favorite is the one around the reservoir. You have to walk on the reservoir wall, near the boat ramp to the inner side of the reservoir. The trail then reaches Stevens Canyon Road again but at the other end of the reservoir. You can turn around and come back the same way or walk back on the road along the reservoir's outer edge, completing a full circle.

Thursday, August 01, 2013

Some good Cassandra, Lucene presentations and misc Comp-Sci posts

About 2 months ago I attended the Cassandra Summit at San Francisco. Yes, I've been meaning to write this blog for a while now. I was surprised (pleasantly) to see such a good turn out. Lot of energy and real world use cases. I didn't get to attend all the talks of course, but all the slides and videos are online. Here are some good ones:

A few JVM related posts worth reading:

Go language and reactions:

If you like the Markdown syntax and want a good, no fuss editor for writing documents:

http://markdownpad.com - for Windows users (paid version can even do tables)
http://dillinger.io - for in-browser use

Some Comp-Sci stuff to keep your (my) mind fit:

Scala and Spark related videos:

DB:

BDB is now AGPL. Out with BDB, in with LMDB
PG and MySQL driver internals
The write cache: Swap insanity tome III

2 Lucene related presentations worth reading:

Until next time!

Sunday, June 23, 2013

Reading list (and RIP Mr. Iain Banks)

Here's my list of books I read these past few months:

RIP - Mr. Iain Banks
Seeker by Jack McDevitt -Watered down scifi. Like a direct-to-DVD sci-fi movie. If you can stay awake through the chapter after chapter of filler - like one long, boring episode of Star Trek
Mirror Dance Miles Vorkosigan Adventures - Smart, clever, crisp. Surprisingly interesting story and great character development
Paladin of Souls by Lois McMaster Bujold - A beautifully written fantasy novel. Engrossing and scary. At the same level as China and Dan Simmons
Night Watch by Terry Pratchett - my first Pratchett novel. Not bad at all, light and funny
Planesrunner by Ian McDonald - Interesting but definitely young adult sci-fi. Story and the worlds had a lot of promise but lacks the sophistication of hard core sci-fi. Kiddie stuff
Terry Pratchett - Small gods. Typical irreverential Pratchett style. Funny and not too bad
The Emperor's Soul: Brandon Sanderson. Novella. Makes for a nice, light, quick reading
The Martian by Andy Weir - Amazing piece of near sci-fi, survival. You'll love the detail especially if you are an engineer. Kindle only
Crystal sphere - Short story. Short but nice
Gabble - Some stories are great fun. Where it comes to Polity, it's uncomfortably close to the great Iain Banks' Culture. Neal Asher should've tried something original and not rip off Banks. Still, worth reading
Six directions of space - Alistair Reynolds. Multiple time lines. Abrupt ending. Short story. Should've gone with a longer, novel format

Until next time!

Sunday, June 16, 2013

ForkJoin - a quick exploration .. long overdue

ForkJoin has been available to us Java folks since Java 7 and if you consider the JSR 166 packages, then even longer. I found the time to explore this API only recently.

Having written about Phasers a couple of years ago and realizing that I'd still not found a use for it in production, I was not too eager to explore another "thread-pool" (just kidding - where would we be today without j.u.c classes).

Anyway, I downloaded the latest JDK 8 pre-relase (b93), changed my IntelliJ 12 language mode to Java 8-with-lambdas and ran some simple tests.

Mind you, the JavaDocs for ForkJoin and related classes are quite elaborate and expect you to set aside some time to go through it in detail... which you can probably postpone if you read this post.

ForkJoin is recommended as a thread-pool if your main task has to divide itself into a lot of smaller tasks, usually recursively. Usually in such scenarios the number of children tasks are not known upfront. Technically, the work-stealing aspect of ForkJoin and the claim that it scales well when faced with a large number of tasks makes it a good fit for such workloads.

Essentially, there are 3 ways in which you can write jobs/tasks to run in a ForkJoinPool - RecursiveAction, RecursiveTask and the new JDK 8 CountedCompleter.

The RecursiveAction is fairly simple. It embodies the logic to work on the root of your computation problem. It also splits its work into smaller sub-tasks recursively. Very similar to a binary search but searching each half will be spawned off as a sub-task recursively. The computations for this tree of tasks completes when the leaf nodes are processed.

I can think of a simplified but realistic use case where you'd want to do a mix of sync and async, parallel sub-tasks:

Receive purchase order request from client
Convert request payload (JSON, XML) to Java object
Make synchronous authorization check with LDAP
Make some async requests

Make async request to inventory service to check and reserve stock
Make async request to shipment service and find closest free shipment date to requested destination
Make async request to fetch similar/recommended items to offer package deals

Consolidate results of async requests
Generate response JSON

You could do steps 2, 3 and 6 in a regular ThreadPoolExecutor. If you need to accommodate priority purchase order processing then you could easily do it with a combination of PriorityBlockingQueue and the right constructor on TPE.

In fact, there are so many implementations of BlockingQueue, for example LinkedTransferQueue and SynchronousQueue which could be useful in some special cases. The Exchanger is another such nugget in the j.u.c package. Apparently CompletableFuture is ideal for such cases (like Scala's Promise and Google Guavas' ListenableFuture) but I was surprised to see there were no examples in the JavaDoc.

(Ok, this is turning out to be a longer post than I had expected. Not a quick exploration after all)

Going back to our example, incorporating the 3 asynchronous operations in step 4 might constitute as sub-tasks of step 4. Although in reality, the JavaDoc for ForkJoinPool says that the ForkJoinTasks should ideally not block on external resources like I/O. This is called "unmanaged synchronization" as it involves waiting for resources outside the fork-join system. For that the ManagedBlocker is recommended, although to me it looks like it was added only as an after thought.

So, sadly the above seemingly real-world example might not be a good case for ForkJoin. Which means the ideal use case is something that involves recursively decomposing and pure computation - a.k.a in-memory map-reduce.

So, we make our way back to the overly geeky sort-merge example used in the JavaDocs. In my case, I decided to dispense with the sorting part and simplified the problem even further - purely for illustration purposes.

In my examples, I use ForkJoin to recursively split and list numbers from "start" to "end". At each step if the start to end range is larger than 5 it splits that range into 2 equal halves and forks them off as sub-tasks. Otherwise that task is the leaf level and just adds the numbers in a for-loop from start to end into a queue that is passed around to all tasks.

The first test is a naive implementation of RecursiveAction where it just keeps forking away sub-tasks till the leaf levels. So, the thread that created the root level task attempts to wait for the whole tree of computations to complete. Since each level that spawns the next level of 2 sub-tasks asynchronously and does not wait ("fork()") for the children to complete, the whole tree completes asynchronously. This way the caller thread in the "main()" method comes out of "invoke()" prematurely. As a result this recursive task is almost what we wanted but not entirely.

Since the naive approach of forking did not suffice, we make a small change by making the parent task wait for its children to complete by calling the "join()" method on its children.

An even better approach is to allow each task to fork away sub-tasks and not have to "join()" on them. Because waiting only means that a thread is not in idle-wait state where it should've been "stealing" work from other threads and making progress. What we need is for a way to let the sub-tasks notify the parent task that it has completed. We can let this bubble up all the way and register a listener at the root.

For the listener we will even use the fancy Lamda feature and something from the new java.util.function package to register a listener. In fact completion listeners can be registered at any level - for example to print to the console that certain % of the tree is complete and so on. There are 2 versions of this - one that sub-classes CountedCompleter to simply let the completions bubble up and then eventually notifies the blocked calling thread in "main()".

The more sophisticated implementation using Lambdas.

Here's an even more sophisticated example that wraps the fork-join pool as a CompletionService and submits 5 tasks and then picks up the results as they complete.

There are a few other things worth reading up about. I skipped the use of RecursiveTask. I also skipped mentioning the different methods to "steal" tasks. The RecursiveAction JavaDocs even has an example that keeps track of spawned sub-tasks and then follows that chain to try and complete them if another thread has not already done it. The reason I did not venture into this bit is because I'm not sure as of now whether it is worth doing this instead of letting the FJ framework do the scheduling internally.

Without studying the source code I can only guess that manually keeping track of spawned sub-tasks and then trying and unforking them would be to help complete that sub-tree of tasks quickly. If we were to use a simple queue to just dump sub-tasks like in the ThreadPoolExecutor, then they would get mixed up with other sub-tasks from other threads in the pool. This means that the current sub-tree may not complete on time because the dependent sub-tasks are somewhere at the back of the queue. This is where FJ shines in addition to it scalability.

One thing we do lose with FJ is that tasks do not have priorities unlike using a PriorityBlockingQueue with TPE so you might end up using multiple FJ pools.

Until next time!

Saturday, June 08, 2013

Hiking in Huddart County Park

Hiking in Huddart County Park.

Nice, secluded, close to I-280. Trails always in the shade, good even in summer.They even have camp sites and picnic benches.

$6 entrance fee. No maps, trail directions are a little confusing. Especially when you are coming back to the parking area there are many roads and unmarked trails branching off. Funnily, we had trouble finding our parking lot at the end. We did not have such problems while hiking though. The single used map they did have was lacking in detail about the parking areas.

Wednesday, June 05, 2013

Diesel - DSL experiments on the JVM (Part 2)

In part 1 we explored some simple ways to fake a DSL using JSON and YAML. In part 2 we will expend a little more effort to build a more powerful mini-DSL.

By "powerful", I mean a DSL that not only supports the language structure we like but also allows for more complex constructs like method calls and expressions.

For this task, I chose Groovy, which is a really nice, Ruby-like language that is tightly integrated with Java. Later, I will also explore Ruby using JRuby just to show how similar Groovy and Ruby are in many aspects.

Groovy runs on the JVM and so it integrates seamlessly with Java. Even IntelliJ comes with native support for Groovy.Groovy can be used like a plain scripted, interpreted language or even compiled. It is a little slower than Java but offers a lot of powerful metaprogramming features to balance it out. I will not go into the details but suffice to say that it is particularly useful for building (surprise!) DSLs. For full blown examples see Cloudify, Gradle, this or this.

At first, I chose the simple approach of just using Groovy as a scripting language to specify the stocks. It didn't really look like a DSL because it is not.

Next, with just a tiny bit of setup where I make some ready made expressions and methods available to the script, I was able to specify my stocks in a nicer and more powerful format. The cool "with" syntax in Groovy also helped.

To demonstrate that I could also write executable code, I had the script print the date and name of the file in the first line.

I've just scratched the Groovy surface because I could've spent a lot more time overloading numeric types like the goodFor property where I could've used "30.days" but I didn't. There are obviously holes in my implementation but you cannot dismiss the speed at which you can get at least this much functionality. Perhaps with a little more Groovy proficiency and time, I could've done better.

Now on to Ruby. To show how similar Groovy and Ruby are, I used JRuby to build this DSL:

I also have a simpler, raw JRuby script but then it's not a DSL but just a script. Again note the similarities with Groovy.

Getting JRuby to integrate neatly with my Stocks beans was a little challenging. It required a slightly different setup. I also ran into some issues which are not documented well in JRuby. So, I asked for help on the mailing list but haven't heard from them. This coupled with the fact that JRuby startup takes several seconds made testing and experimenting a little frustrating.

Ruby in itself is used in a lot of places to build DSLs like Chef, RSpec and a lot of other projects.

One option I obviously overlooked is Scala. Scala is well known (among the Scala users) for its "apparently" powerful language features. However, in my opinion Scala's complex, sometimes bizarre, dense and obtuse syntax might keep it out of reach of average engineers like myself. I've shared this opinion earlier too.

So, that's it for now. I may extend this preliminary work on Diesel later when I have the time. Or better yet, you can fork it and share it.

Cheers!
Ashwin.

Diesel - DSL experiments on the JVM (Part 1)

I've been meaning to write about my series of little experiments building a mini-DSL on the JVM.

I've worked with and written about expression evaluators before. However, there've been many instances where I've felt the need to quickly build a part pseudo-language and part configuration script.

Also, I did not have the time, resources nor the justification to build a full fledged grammar/parser. There were times where I did morph an existing ANTLR grammar for something else, but in the end I realized that a simple hand built tokenizer and AST would've done the trick. (Note to self: try Parboiled)

So, I was curious to see what options I had to build an "almost language", quickly. "Quickly" being the operative word. "Dirty" being the unsaid word.

I'm not going to go into the details of what a DSL is, or spend time debating over internal or external DSLs etc. Enough material is available on the internet and some books too.

If you want to learn more about Java API based DSLs - more commonly known as a Fluent DSL, there are several good places to start learning by example - Jooq, Google Guava ComparisonChain etc. I've built Fluent DSLs several times and it is a cleaner and better way to implement the Builder design pattern - like Google Protocol Buffers' Builder.

This time though, I wanted to evaluate options to build something that could read/parse/load files that look like structured, readable English. Some common cases where you'd need this:

Configuration scripts are prime candidates for this. In the early-mid 2000's, XML would've been the way to go; with XPath and XSDs/DTDs; built in support in the JDK and support for hierarchical structures
Glue to stitch together different modules in a program - something that usually involves some configuration code and basic expressions
Actual mini-languages that allow business analysts or IT/DevOps people to plug in some logic without writing complex Java code. Also without having to get developers and a full build cycle involved

So, let's cut to the chase and see what I came up with.

For my tests, I wanted to accomplish something very simple. I wanted a way to describe stock buying or selling instruction to my stock broker. It's a completely contrived example of course but it seemed valid for this test. I wanted a way to specify which stock to buy or sell, at what price, for how long the instruction is valid and some other little things.

Since I brought up XML, I'll talk about the simplest approach first - XML's slightly less ugly cousin JSON:

Using JSON and calling it a DSL is not only dumb but also cheating. But there are obviously a lot of places where this would suffice. Unlike XML, this is less verbose, but it still needs the user to know where to put double quotes, braces, square brackets and all this without a schema to validate the file.

It does have its advantages. All I had to do was create a JavaBean with all the possible combinations my "stock specification" could have and then use Google Gson to do the serialization/deserialization to/from JSON.

This is what the JavaBeans look like:

Assuming that this was enough, all I had to do was read the JSON into the Stocks bean and related inner classes and start using it.

The keen reader will notice that the Order class has some properties - limit, stopLimit, market which are really mutually exclusive. JSON does not prevent me from providing values to all 3 which would be wrong. I could've spent some more time fleshing those properties into an enum or a complex string but I'll leave that as an exercise for later (or the reader).

The full source along with scripts can be found on my GitHub Diesel repo for your reference.

So, I decided that JSON wouldn't cut it. A while ago I had played with YAML briefly, which is JSON's distant cousin. Actually, the latest YAML spec makes it JSON's parent (how convenient).

YAML is like JSON but without the frivolous double quotes and braces. Compare this YAML file with the previous JSON file, it speaks for itself:

It is without doubt, cleaner and more usable than JSON. You use SnakeYaml to do automatic ser/deser into the Stocks JavaBean like Gson.

Also, like Gson, if you don't have a bean or your configuration makes it difficult to map directly to a bean, you can just read it free form as a map of maps. This would be a poor man's AST. Gson's free form structure is actually better that way, in that it almost looks like XML Nodes.

If YAML is good enough for you, you can stop reading right here. In fact, YAML is also my favorite for simple configuration files that involve lists and hierarchies. This is miles ahead of and better than the flat format used in Java Properties.

But, defining YAML still has the same issues that JSON had with regards to semantic validations like limit, market etc. However this is really an issue with the way I've created the beans. Think of the YAML file as a free form AST. You'd have to write your semantic and syntactic validations in your Java code by walking this AST. I prefer to do this in Java because it's easier to have all the validations and exception messages in one file than split it across multiple ANTLR and Java files.

In part 2, we will explore other framework and language choices. Until next time, take care!
Ashwin.

Wednesday, May 29, 2013

Camping at Joshua Tree National Park

This Memorial Day weekend we camped at Joshua Tree National Park. Campsites were full and we were just lucky to get (probably) the last campsite that wasn't already taken up. Next time we go camping we have to remember to get there a day early at these "first come first served" sites.

We spent a little less than 24 hours at the park. Camping was fun and we managed to do a moderately strenuous hike to 49 Palms Oasis. Overall it was not bad. Being a desert there's really not much to write home about.

So, we spent an evening at Knott's Soak City on our way back from Joshua Tree.

The next day we did a nice hike in LA, where the world famous Hollywood sign is posted on the side of a hill.

Overall the weather was warm, sunny and cool winds blowing that made hiking at both places enjoyable.

View Trip/May 2013/Joshua Tree and LA in a larger map

Cheers!

Monday, May 13, 2013

A collection of "low level" JVM and JavaScript related articles and more

Here's a nice collection of "low level" JVM and JavaScript related articles. None of which would be on anyone's list for low level programming.

While doing some reading on math and matrix operations in Java I came across many projects trying to overcome the limitations of the JVM while trying to implement numerical recipes:

A good (slightly old) paper on ways to store matrices more efficiently
Some evaluation of the different libraries that do efficient, element wise matrix operations
More interestingly, we still don't have SSE support in Java for matrix operations - a request that has been open since 2007

While we are stuck with this, Dart seems to be making better progress by supporting SIMD instructions. JavaScript is getting weirder ("low level") with Emscripten and Asm.js.

Other interesting Java related reading material:

Unsafe and lazy set - I've long wondered what lazySet() did
Packed Objects - tuples in the JVM
An evaluation of collections in Java for primitives
Transform your JARs into executables
JVM heap dumps using GDB
I had no idea "top -H" would display all threads in a process/JVM
Crash the JVM instantly to see if your app can survive crashes (hint: Unsafe.putLong(0, 0) to force a segfault)
Disruptor + log4j2 = very fast, async logging
If you've never seen Doug Lea on stage before - Future of the JVM, panel discussion
Charlie Hunt – The Fundamentals of JVM Tuning, a nice tutorial
Gallery of processor cache effects
Two good Cassandra tech talks

More next time! (Of course)

Monday, April 22, 2013

Graphs, machine learning, PostGres and other tidbits

I hadn't pushed out my "favorite reads of the season" for a while. So, here's a bunch of links to keep you occupied over the next few days.

Graphs, search and recommendations:

Under the Hood: Building out the infrastructure for Graph Search
LinkedIn's Cleo and Search (Cleo claims to have been inspired by FB's graph search)
GraphLab - another take on large scale, distributed graph processing. Similar to Apache Giraph but not based on Hadoop code.
Graph Based Recommendation Systems at eBay. Graphs, algebra and Cassandra. (I had to go back to basics to understand slide 9)

Statistics, machine learning presentations and resources:

A couple of peculiar, networking related blog entries.

Discussion on Redis mailing list about SSD / Twitter Fatcache / Facebook McDipper and a follow up.

While doing some research on NoSQL systems, especially Cassandra, I was surprised to hear that newer releases of Cassandra are moving away from the flexible, semi-structured column families. Instead with CQL, there is a well somewhat restrictive, repetitive schema that should work well for certain workloads. Is it me or does it look like NoSQL is grudgingly moving towards SQL?

Speaking of SQL, PostGres is moving in the other direction. Recent (9.x+) versions have some very interesting column data types - Array, HSTORE, JSON etc. Of course, its SQL support is obviously fantastic.

And finally, a nice talk on trade processing and a of paper on MongoDB for finance.

Ashwin.

Sunday, April 14, 2013

Importing OpenSSL/EC2 .pem keypair to Java keystore

I spent several hours scouring the internet looking for a way to import (OpenSSL) Amazon EC2's .pem keypair into a Java keystore. At the end of this frustrating exercise I was baffled to see how scattered the information was.

(FYI, doing this on Windows, especially the OpenSSL interaction part, for self-signing a certificate was painful even with Cygwin. I had to resort to using my Linux distro running in a VM)

To save myself time in the future and for those of you tearing your hair out looking for the same information, here it is. (The file paths are not real. You have to clean them up to match your setup):

Here are my references in no particular order:

To complement this, there were other things I had to do (being a first time user of EC2) to make my EC2 instance accept SSH connections:

And then to install Oracle JDK 7 on my EC2 Ububtu image:

http://thedaneshproject.com/posts/how-to-install-java-7-on-ubuntu-12-04-lts/

Ashwin.

Thursday, March 21, 2013

Information Overload

Information overload a.k.a:

Too many websites, blogs, apps, social networks and not enough unification (and time)
Whatever happened to open formats? (ahem.. RSS/Atom?)

Wednesday, February 27, 2013

YesSQL, JVMs that need to be NUMA aware & other stories

Here's a whole bunch of fascinating reading material I've accumulated these past few months. You can tell there's a lot of love going on for SQL/RDBMS. Then some crazy JVM deployments that make you sit up and wonder. There's also quite a bit of performance related articles on UI/browser technologies.

Data tier:

JVM:

How much memory objects take on JVMs
NUMA-Aware Java Heaps for in-memory databases (Watch for the line that says "heaps approaching 1TB" in size! Of course they mean using DirectBuffers but still..)

Here's a nice tool that I've filed for later. Esp useful if you find yourself doing production/support calls - Your logs are your data: logstash + elasticsearch. Sort of a poor man's Splunk.

UI (mostly beating the life out of HTTP and JavaScript):

Faster Websites: Crash Course on Web Performance
How browsers work
WebKit in Your Living Room
Building the Netflix UI for Wii U
High Performance Networking in Google Chrome
Talks To Help You Become A Better Front-End Engineer In 2013 (There are some good ones there)

After covering all 3 tiers - DBs, JVMs and UIs, why stop there when you can finish it off by learning something about QA/unit testing? Here are some relatively new JUnit features (they've finally caught up with TestNG):

JUnit:

Rules
Parameterized tests and its better description here
Assumptions
Theories and its nicer description here
Matchers

That should keep you busy for several days. Until next time!

Sunday, December 15, 2013

Sunday, November 24, 2013

Sunday, November 17, 2013

Friday, October 11, 2013

Sunday, September 08, 2013

Saturday, August 10, 2013

Thursday, August 01, 2013

Sunday, June 23, 2013

Sunday, June 16, 2013

Saturday, June 08, 2013

Wednesday, June 05, 2013

Wednesday, May 29, 2013

Monday, May 13, 2013

Monday, April 22, 2013

Sunday, April 14, 2013

Thursday, March 21, 2013

Wednesday, February 27, 2013

Who is Ashwin Jayaparkash?

Disclaimer

Quick links

Blog topics

Blog archive

Blog feed (Atom)