Friday, December 18, 2015

Vasquez Rocks and Channel Islands

It's 3 months too late, but better late than never eh? We did a trip to LA during the Labor Day weekend. Naturally, we had to visit Vasquez Rocks, it being one of the most featured rocks in Sci-Fi films.

















We still had plenty of daylight, so we decided to drive up the famous Mulholland Drive, which proved to be quite disappointing.
















The main attraction was Channel Islands. We had to book our boat ride tickets well in advance.

















Remember to carry sandwiches, snacks, water and sodas with you as there is nothing available on the island. The cliffs overlooking Potato harbor are great to stop for lunch. But the harbor itself is inaccessible from the top. The only access to the water is by boat.


Hiking at Pinnacles National Park

We spent the day after Thanksgiving at Pinnacles National Park. The weather was crisp and cool, with very light traffic, probably because everyone else was busy shopping.

We did a 5 mile hike via the reservoir to Scout's peak and back to the visitor's center.

I would love doing this hike again, but probably not during the warmer part of the year. It was great to see 23 million year old lava formations but scary to know that it's so close to the Bay Area.






Sunday, December 06, 2015

Fall 2015 tech reading

Java:

System:
Big systems:
Misc:
Until next time!

Saturday, October 10, 2015

Late summer 2015 tech reading

This should keep you busy for a few weekends.

(Once again, thanks to all the people who shared some of these originally on Twitter, Google+, HackerNews and other sources)

Java/Performance:

Java Bytecode Notes:
Java 8/Lambdas:
Tech Vids:
Data:
Misc:
Some old notes on SQL Cubes and Rollups:
Until next time!

Wednesday, August 12, 2015

Summer 2015 tech reading and goodies

Java:
Go:
Graph and other stores:
  • http://www.slideshare.net/HBaseCon/use-cases-session-5
  • http://www.datastax.com/dev/blog/tales-from-the-tinkerpop
  • TAO: Facebook's Distributed Data Store for the Social Graph
    (snippets)
    Architecture & Implementation
    All of the data for objects and associations is stored in MySQL. A non-SQL store could also have been used, but when looking at the bigger picture SQL still has many advantages:
    …it is important to consider the data accesses that don’t use the API. These include back-ups, bulk import and deletion of data, bulk migrations from one data format to another, replica creation, asynchronous replication, consistency monitoring tools, and operational debugging. An alternate store would also have to provide atomic write transactions, efficient granular writes, and few latency outliers
  • Twitter Heron: Stream Processing at Scale
    (snippets)
    Storm has no backpressure mechanism. If the receiver component is unable to handle incoming data/tuples, then the sender simply drops tuples. This is a fail-fast mechanism, and a simple strategy, but it has the following disadvantages:
    Second, as mentioned in [20], Storm uses Zookeeper extensively to manage heartbeats from the workers and the supervisors. use of Zookeeper limits the number of workers per topology, and the total number of topologies in a cluster, as at very large numbers, Zookeeper becomes the bottleneck.
    Hence in Storm, each tuple has to pass through four threads from the point of entry to the point of exit inside the worker proces2. This design leads to significant overhead and queue contention issues.
    Furthermore, each worker can run disparate tasks. For example, a Kafka spout, a bolt that joins the incoming tuples with a Twitter internal service, and another bolt writing output to a key-value store might be running in the same JVM. In such scenarios, it is difficult to reason about the behavior and the performance of a particular task, since it is not possible to isolate its resource usage. As a result, the favored troubleshooting mechanism is to restart the topology. After restart, it is perfectly possible that the misbehaving task could be scheduled with some other task(s), thereby making it hard to track down the root cause of the original problem.
    Since logs from multiple tasks are written into a single file, it is hard to identify any errors or exceptions that are associated with a particular task. The situation gets worse quickly if some tasks log a larger amount of information compared to other tasks. Furthermore, an unhandled exception in a single task takes down the entire worker process, thereby killing other (perfectly fine) running tasks. Thus, errors in one part of the topology can indirectly impact the performance of other parts of the topology, leading to high variance in the overall performance. In addition, disparate tasks make garbage collection related-issues extremely hard to track down in practice.
    For resource allocation purposes, Storm assumes that every worker is homogenous. This architectural assumption results in inefficient utilization of allocated resources, and often results in over-provisioning. For example, consider scheduling 3 spouts and 1 bolt on 2 workers. Assuming that the bolt and the spout tasks each need 10GB and 5GB of memory respectively, this topology needs to reserve a total of 15GB memory per worker since one of the worker has to run a bolt and a spout task. This allocation policy leads to a total of 30GB of memory for the topology, while only 25GB of memory is actually required; thus, wasting 5GB of memory resource. This problem gets worse with increasing number of diverse components being packed into a worker
    A tuple failure anywhere in the tuple tree leads to failure of the entire tuple tree . This effect is more pronounced with high fan-out topologies where the topology is not doing any useful work, but is simply replaying the tuples.
    The next option was to consider using another existing open- source solution, such as Apache Samza [2] or Spark Streaming [18]. However, there are a number of issues with respect to making these systems work in its current form at our scale. In addition, these systems are not compatible with Storm’s API. Rewriting the existing topologies with a different API would have been time consuming resulting in a very long migration process. Also note that there are different libraries that have been developed on top of the Storm API, such as Summingbird [8], and if we changed the underlying API of the streaming platform, we would have to change other components in our stack.
Misc:
Until next time!

Monday, June 01, 2015

Spring 2015 reading list

Here's a giant list of articles I read and liked (hat tip to people I follow on Twitter/Blogs. I'm just re-sharing this):

Saturday, May 30, 2015

Lassen Volcanic National Park

We spent the Memorial day weekend hiking at Lassen National Park.

The drive was pleasant and surprisingly no traffic on the way and back. The weather was great and we did 2 hikes - Cinder Cone and a hike to Devil's Kitchen and Boiling lake in Drakesbad.
Unfortunately one of the main attractions at the park (Bumpass Hell - heh real name) was not open for the year yet.









Sunday, April 12, 2015

A simple guide to using Unix/GNU Linux command line tools for fiddling with log files (*runs on Windows too)

I've been meaning to write this post for years now. Every time I thought about compiling a basic list, I've told my self "Nah.. there must be tons of examples on the net". Yes there are tons of them but I couldn't find anything:

  • That helped absolute noobs with a consolidated list
  • That demonstrated actual fiddling with Java log files
  • Something that works on Windows(!) No, I don't mean the awful Cygwin tool but something like Busybox or the wonderful Gow
So, here it is:
Enjoy!

Sunday, February 01, 2015

Starting 2015 with yet another link dump

A belated happy new year! Here's some reading material I've been accumulating for a few months.

Distributed systems:

Performance related:
On tuning:
Misc tech articles:
Formatting comments on Gerrit:
That's it for now!