Experimenting with Scala and Acceleo
During the last few days, I have been prototyping some new stuff but contrary to some of my previous experiments, this time I will not try to create something new, I’ll try to improve an existing component of Acceleo, the Acceleo parser.
The current situation
For this experiment, I choose to concentrate my effort on the analysis of the text. For those who are not familiar with parsing and more specifically the Acceleo parser, here is how its working:
Acceleo parses the text in a given file to create an abstract syntax tree representing the different parts of the text (module declaration, templates queries, if blocks, etc) Then Acceleo resolves the links between modules (module A is extending module B, module B is importing module C, etc) After that, Acceleo uses the MDT OCL project to parse the OCL expressions (guards, body of the queries and OCL expressions in templates, etc)During all those steps, we are also detecting some potential errors here and there. In the first part, we detect the most important errors to quickly find out invalid modules, then in the next steps, we will look for logical errors like undefined operations, duplicated template declaration, or even invalid metamodels. This prototype will for now only try to replicate the first step with the analysis of the text to produce an AST.
The classes used by Acceleo to parse the text into an unresolved AST have 1107 + 1781 + 373 + 440 + 281 = 3982 lines of code. This does not take into account all the code dedicated to the parsing of OCL expression and the resolution of the links between modules. It can detect a huge number of errors in an Acceleo module to find out invalid modules, few of those detect logical problems (an OCL keyword used as the name of a template for example).
Experimenting is fun
This prototype of Acceleo parser in Scala can successfully parse all the valid Acceleo modules in the Acceleo parser unit tests and it detect at least one problem in all the invalid ones with only 217 lines of code. It would be completely irrelevant to compare both numbers like that but it illustrates how the support for parser in the Scala standard library can help to write very concise and efficient parsers. This Scala parser does not have any comment and it is missing the detection of about a third of all the errors detected by the original Acceleo parser, yet I believe that I can achieve the same level of quality for about 400 lines of code.
Talk is cheap. Show me the code. (Linus Torvalds)
The problem with writing parsers in Java is the lack of support for parsers in the Java standard library. On the contrary, in Scala, it is very easy to start writing a parser because Scala provides you with lot of basic implementations of parsers.
After spending some time on internet following some tutorials to master those implementations, I quickly realized that even if those implementations would allow me to do 80% of the work in no time, the remaining 20% would be a real challenge and I would have to create my own customs parsers. In Acceleo, text literal and static text in a template can contain pretty much all the delimiters and tokens that you can think of. Regular expressions and token based parsers cannot come even close to offer a valuable solution to parse pieces of text with such complex patterns.
So, in order to do handle this problem, I had to create custom parsers and for that there is only one concept to understand. Since Scala is object oriented and a functional language, methods are first class citizens just like object and they can be used as a parameter of an other method for example. With the help of closures you can access variables out of the scope of your method and write easily concise and efficient piece of code but there is one thing often forgotten. Since method are first class citizens, you can also extends them, so in the end, a parser is just a subclass of a function that takes an “Input” and returns a “ParserResult”.
abstract class Parser[+T] extends (Input => ParseResult[T]) {…}
To create your own custom parser, you just have to create a function that will returns a “Success” or a “Failure” object like in the following example.
I also discover that you cannot create a parser combinator that combines several parser combinators from other classes, a problem often ignored in most tutorials yet it can be bypassed in an elegant way by mixin traits.
I’ll need to work more on this prototype but I’m progressing very quickly since Scala and Java have lots of similar concepts. So stay tune for more information.