Wednesday, July 22, 2009

To parse or not to parse

I remember spending a lot of time working on SQL extensions and ANTLR grammar not too long ago. This was during my StreamCruncher days.

Looking back, the effort it required to write a clear, unambiguous grammar, simplify the AST, validate it and then finally produce some executable code was considerable. I distinctly recall that it was not very enjoyable.

Nowadays, the Fluent interface is gaining popularity, especially for implementing mini-DSLs. Here's a introduction to designing DSLs in Java.

For clarity, I'll refer to the custom language/grammar as Stringified-DSL and the Fluent interface as Fluent-DSL.

Now, don't get me wrong I'm all for designing a concise and crisp "language" for specific tasks. That's the whole reason I built my program on SQL extensions (i.e Stringified). But does it always have to be a dumb string - something in quotes, which cannot be refactored, no IDE support like auto-complete, the hassle of providing a custom editor for it...?

It doesn't just end there. When you expose a custom Stringified-DSL you and your users are stuck with it for a very long time. If you want to expose any data structures which your users will eventually ask for - after playing with the simple stringified demos, then you will have to ship the new grammar to them, enhance your custom editor, document them clearly. Also, if the underlying language - let's consider Java suddenly decides to unleash a whole bunch of new awesome APIs, syntax and other language enhancements; which is exactly what happened with Java 5 if you recall - Annotations, Concurrency libraries, Generics. You see what your users would be missing because you chose the Stringified route?

In case you are wondering why your users would want to use Concurrency, data structures and such stuff - we are talking about Technical users and not Business users. Business users will be happy to play with Excel or a sexy UI, so we are not talking about their requirements.

So, what are the benefits of exposing a Fluent-DSL?

  1. Easy to introduce even in a minor version of the product. Quick turn around because it involves mostly APIs

  2. Documentation is easy because it can go as part of the JavaDocs

  3. Easy for the users to consume and use. Lower setup costs and only a mild learning curve

  4. IDE features - Refactoring, Type safety, no custom editor required

  5. Base language features - Users don't have to shoehorn their code into a clumsy custom editor, code can be unconstrained and they are free to use data structures, Annotations, Generics, Concurrency, Fork-Join and all the other programming language goodies. Which is very good in the long run because you don't have to paint yourself into a corner with all that custom stuff

So, what tools does our mild-mannered Developer have at his/her disposal? Some nice Open Source libraries I've spent time about reading come to mind:
To be fair to Stringified-DSLs, there are some nice libraries that would make life easier for everyone involved in case you badly have to go the stringy route.
  • MVEL - my favorite. The Interceptor feature is a nice touch

  • JSF Expression Languages like - JUEL and SpEL

  • MPS from the IntelliJ guys. I haven't understood this fully. It seems to be somewhat like MS Oslo, maybe less. But Oslo itself is hard to understand because of the poor arrangement of their docs

  • DSLs in Groovy - although I'm not very convinced since Groovy itself has a learning curve

If you are a .Net user, how I envy your LINQ extensions. Man that was a brilliant move. I wonder how they managed to get their parsers and editors to work so seamlessly with the regular C# code.



Ashwin Jayaprakash said...

I spotted another new presentation -
Creating DSLs in Java about Eclipse xText. I have to research this.