Friday 5 June 2015

Tutorial: Language Development with XText: A Logo example - Part I: Introduction to XText

Brief description of XText

XText is a Language Workbench. Meaning an environment for developing languages.

It targets from small Domain-Specific Languages (also known as DSL), to also complex General-Purpose Languages).

What's is good about it (well it has many good points) is that it gives you a pretty good base for quickly starting up a languages, by providing you advanced features only based on a declarative description of your language.

To give you an idea:

  • you don't need to write a Lexer
  • you don't need to write a Parser
  • you don't need to write a code editor from scratch
  • and so on..
It has pretty good defaults for almost all aspects of a language (like scoping and references).

So if if you already know XText at least at a basic level you know what I mean.
If you don't then I encourage you to read the following links:

If you happen to know Spanish then this link can be useful 

Intro to XText

The grammar

The starting-point for an xtext language is a file called "the grammar".
It's kind of a BNF (Backus-Naur Form) declaration of the language, if you know what BNF is.
If you don't then don't worry (although I would highly recommend reading Naur papers about software development and the industry ;))

An xtext grammar is a file that is itself written in an specific format.
The goal of this grammar is to define two aspects of your language altogether:

  • The Syntactical Form: allowed symbols of your language, and their order, spaces, keywords, etc.
  • The Semantical Model: meaning the core "concepts" of your language and how they are composed. This will be represented as Class-based O.O. design. But you don't need to write those classes.
Let's put an example.
If you want to write a language for specifying products and their prices. The the syntax part could be defining keywords like: 'prod' , '$', etc.
But your concepts are: "a Product", "a Price", and the idea that "a Product has a current price".

Here is a sample grammar for the Hello example:

This tell us that a file for this language will contain a "Model"
The model has a list of "Greeting"
A Greeting has a name whose format is an "ID".

In classes it will look like this:

And indeed, XText generates this classes automatically for you. Actually interfaces.
Of course based on the grammar file.

Besides that you probably noticed that Greeting has some Strings there.
That's part of the syntax.

You read that Rule (Model and Greeting there are called "Rules") as follows in terms of syntax:

"A greeting is defined by the keyword 'Hello' then an ID you must provide and the a semi-colon".

This are valid examples of the DSL:

  • Hello World !
  • Hello Uqbar       !
  • Hello Project!

The Runtime Part

Just by defining the grammar XText will generate for you:
  • The classes for your semantic model
  • The parser, lexer and linker that will read a file and create instances of your semantic model
  • A rich text editor with autocomplete, reference browsing and searches, syntax colouring
  • An outline view
  • And many extension points
Now what is up to you is what you want to do once the file is parsed into the semantic model.
I mean, what is supposed to do !

XText will parse the text into instances of your semantic model and then you choose what to do with it.
Here's a sample diagram (sorry it's in spanish)

For our example a file with greetings should write them to console ? should send an SMS ?
should publish them to Facebook ?
What ever.

That's what we call the runtime part of your language

And there's a whole new world there depending on the path you choose.

Basically with XText there are 3 options:

  1. Generating code
  2. Inferring (?) code
  3. Interpreting

We will discuss a little bit the first two options just to give you an idea, and the focus on the last one.
Since, the first two strategies are part of XText documentation and tutorials, but you won't find much info about XText Interpreted languages.

Languages that Generate Code

This first options is basically to go visiting all the object instances of your semantic model, once they have been parsed, and somehow generate executable code.

It's abstract like that, because you could generate:
  • Textual code: for example java code, or C#, or python, or even C
  • Binary code: directly executable code.
The first option allows you to use an already functional backend or Virtual Machine like the JVM, so you just get rid of a lot of work what would mean creating your own VM.

The bad part is that your language cannot be directly executed.
If you generate Java code then to run your program you will need three steps:

  1. Run the generator: which will produce .java code
  2. Compile the Java code
  3. Execute the compiled Java code

If you use an interpreted language like Python the just two:
  1. Run the generator: which will produce .py code
  2. Execute the Python interpreter
Of course you can always wrap up a set of tools and scripts to provide to the end user so that it will encapsulate all this steps, and for him it will just mean running a single command.

Anyway, the point is that you generate code.
How ?
For textual code, XText already provides you a way to do this implementing the IGenerator interface

  1. class MiGenerador implements IGenerator {
  2. override void doGenerate(Resource resource, IFileSystemAccess fsa) {
  3. for(e: resource.allContents.toIterable.filter(Model)) {
  4. ...
  5. }
  6. }
  7. }

 Then like a templating engine you can write string expressions with dynamic parts using xtend RichStrings

  1. def compile(Entity e) '''
  2. package «e.eContainer.fullyQualifiedName»;
  3. public class «» {
  4. }
  5. '''

What's good about this strategy is that (at first) it is "simple" to generate code. Because you just write it embedded in xtend and replace the dynamic parts with expressions.

The bad is that eventually if your language not "small" and you start to have a complex semantic model or complex rules for generating code, and you need to reuse templates, and stuff like that, then it tends to get messy an difficult to maintain.

Also (and this will make sense next when we'll see the second strategy), you could be generating invalid code and your generator could just not be aware about that.
It will explode on the user's face once it will try to run the generated code.
Think of types issues in the generated java code.
For example not importing classes, or incompatible types assignments, etc.


The JVM Inferrer Model

The second strategy is also about generating code, but not any kind of code, but just Java code.
Yeap. Deal with that. You can also generate java code.
The good news is that you don't need to write the generator in Java. You'll use xtend.
And the end user of your language won't either write Java code, so, it's just a good an intermediate language to avoid writing something like assembler.

So, the JVM Inferrer also generates java code, but the way it does it is completely different.
You won't use code templates with just Strings.
You will use an API to generate code.

So you are not responsable of writing the code "text", but instead to map your semantic model into a model which represents the Java concepts like Class, Method, Field, etc.

The generator is also an xtend class which implements a given interface

  1. class DomainmodelJvmModelInferrer implements IJvmModelInferrer {
  2. override void infer(EObject model, IJvmDeclaredTypeAcceptor acceptor, boolean preIndexPhase) {
  3. ...
  4. }
  5. }

Here's a code snippet which based on an object instanceof Operation from the semantic model it generates a field, a getter and a setter for it as members of a class that is currently generating

  1. Property : {
  2. members += feature.toField(, feature.type)
  3. members += feature.toGetter(, feature.type)
  4. members += feature.toSetter(, feature.type)
  5. }

There are number of advantages of using this inferrer or mapper

The first one is that it's now code, and you have an API which models all the Java source concepts, so you should be able to design complex code generation logic by applying all the OOP good practices.

But also, the most important part is that XText will keep a link between the Java generated code and the semantic model which generated that part of the code.

In this case XText will know that the generated field, getter and setters where derived from a particular instance of your semantic model.
And even better XText knows given a semantic model instance from which part of the text came from.

So, it is really clever, and when it detects that the generated java code has a problem (like a compilation error we mentioned before), it will point out the original piece of code that the user wrote.

It will also do more complex features like inferring types from the generated java code and then applying validations back to your language. So it will avoid the user writing code that won't compile in the Java code.

You need to see it with your own eyes. Go there and follow their docs here

There are also some examples of this strategy.
Otherwise we will create a new tutorial in the future :)

Interpreted Languages

The last strategy is to completely avoid generating code.
We say, Xtext is a Java framework, and it's already instantiating Java objects for our semantic model. We have those objects there available from java, so instead of visiting them to generate code, why don't we just visit them to perform the actual execution ?
For example for each Greeting, we will print it to console through System.out.println().

And that's basically the interpreted model.

You'll basically need something like a Main class which will receive the file path that you want to interpret.
It will call Xtext to parse this file into Semantic Model instances, and then go through it to interpret it and do something

This is the strategy that we will focus on for the LOGO example.
The advantages of it is that:
  • there's no intermediate language
  • no extra commands or step to execute
  • at first and if the language is simple, then the interpreter is a really good way to have something  working right away.
The disadvantages are more difficult to explain right now. We will wait for the end of the post.
But basically it's the fact that the semantic model are not 100% "your classes". They are generated so you cannot touch them. And that adds complexity. That won't allow you to fully desing with OOP (although XText has many cool features for tackling that like multi methods and extension methods).
Also you are using directly those objects that are kind of AST objects. They are tied to all the eCore and EMF infrastructure.
So you cannot instantiate them easily. This means that your language core classes are tied up to eclipse technology and frameworks.
There's also a runtime overhead there (plugins, dependencies, eclipse, osgi, etc).