Got dirty data? I am very impressed with this data cleansing tool built by the Freebase guys. Specifically because it is open source and the UI is so well done - Freebase Gridworks.
Haha, yes another NoSQL system which claims incredible performance - KumoFS.
Spring. No, not the season, I mean the framework. I have many questions about why it's required in light of the supposedly simplified new JEE spec.
I wonder how all those various distributed Lucene index implementations perform. Apache Solr itself offers most of those enterprise features.
- http://www.compass-project.org/
- http://code.google.com/p/terrastore/
- http://github.com/tjake/Lucandra
- http://sna-projects.com/zoie/
- http://katta.sourceforge.net/
The LinkedIn guys have a simple implementation of a load balancer using Zookeeper and NIO/Netty. Nothing new here, but this one is in Scala. Apparently they like Scala too, just like the Twitter dev team.
Have a nice and long (for some) weekend!
4 comments:
Oh yeah, Twitter's Gizzard is like LinkedIn's Norbert - twitter/gizzard
terrastore-0-5-0-an-interview-with-lead-developer
Realtime Search for Hadoop
Duplicate songs detector via audio fingerprinting - CodeProject uses Minhash and Locality sensitive hash.
Note on the Shazam patent.
Another one Echoprint
Post a Comment