Markup bridge: from Wikitext to EMF

Posted on in Blog

Today I wanted to talk about one of the components we had to develop for Mylyn Intent: the Markup bridge, which allows to represent any Wikitext-parsable document (written in MediaWiki, Textile, Confluence, TracWiki and TWiki markup) as an EMF model.

Notice that this component is completely independant from Intent, and only depends on Wikitext and EMF.

 

1. Why on earth did we do that?

As explained in my previous post, we decided to represent Intent documents as EMF models. An Intent document is nothing more that a set of pure documentation zones (written using a Wikitext syntax) and additional information (links between doc and Java classes, Manifest files…).

So to be able to represent an Intent document as a model, we first needed to represent Wikitext content as a model.

2. How does it work?

 It is quite simple: we use a Builder Design-Pattern:

The MarkupParser (provided by Wikitext) sends signals while parsing a text conform to a Wikitext syntax (e.g. beginSpan(), image(url,links)…) The ModelDocumentBuilder (provided by our markup plugin) intercepts those signals, and: Changes state if needed (e.g. receiving the beginBlock(LIST) signal in the SSection state will makes us move to the SList state) Creates the corresponding model elements

image

3. How can I use it?

Sources of the org.eclipse.mylyn.docs.intent.markup plugin are available on github (or from the intent git repository).

Then you just have to create a new Markup parser (pure Wikitext) and plug the ModelDocumentBuilder (provided by the markup plugin) : 

// Step 1: create parserMarkupParser parser = new MarkupParser(new TextileLanguage());DocumentBuilder builder = new ModelDocumentBuilder();parser.setBuilder(builder);// Step 2: parse textile stringparser.parse("h1. Some Textile text");// Step 3: get corresponding modelCollection correspondingModels = builder.getRoots();

And for those who (like me) enjoy EMF, the markup plugin also defines a WikitextResourceFactory, associated by default to any file with the ".textile" extension, that creates an EMF Resource from a textile file.

Then

new ResourceSetImpl().getResource(URI.create("platform:/resource/myProject/myFile.textile"))

will return an EMF resource containing the content of the textile file as a model. We even have a prototype, the WikimediaResourceFactory, that allows to read a page from a Wiki and represent it as a model. Pretty useful for exporting the content of a wiki as a PDF for example.

4. What benefits for me?

Well representing information as model brings a lot of benefits, as explained in my previous post. The main one in that context may be that it is straightfoward to write new generators for Wikitext documents. The markup.gen plugin contains HTML & Latex generators - contributions are welcolme :)

But all over benefits are interesting too: it is now easy to define validation rules on our textile documents, you can store it in databases or collaborate in real-time on textile documents…

According to me, being able to represent any document (for the time being it is just for Wikitext-compliant docs, but we can imagine doing the same thing for AsciiDoc, Latex or LibreOffice documents) in an unified way allows to use the right tool for the right task.

I think it makes cense to pull-up this markup plugin in the mylyn.docs component (instead of Intent), I have started a discussion about this with the mylyn docs community.