Software Engineering and Architecture Group (SEARCH) > CS > JBI > FWN > RUG

Deriving Syntax Highlighters from Context-Free Grammars

Language workbenches provide an integrated experience for developing software languages. Language users are supported by editor services, such as syntax highlighting, jump-to-definition, outlines etc. The goal of this project is to leverage existing editors for providing such services. More specifically, the project is about generating syntax highlighting support based on context-free grammars.

Many editors (e.g., VS Code, Textmate, SublimeText, Atom, ACE, CodeMirror etc.) or highlighters (e.g., Highlight.js, Github) accept state-based language definitions for defining syntax highlighting. In the context of the Rascal language workbench, however, coloring is derived from context-free grammars. How can we derive state-based highlighters from Rascal's context-free grammars?

As a starting point, you'll take the approach detailed in this paper: Mohri, Nederhof, Regular approximation of context-free grammars through transformation, Robustness in language and speech technology, 2001, Springer, [pdf].

Resources on state-based language tokenizers:


  • A prototype for transforming grammars into highlighters (in Rascal).
  • A precise description of the algorithm, including limitations and trade-offs.
  • Evaluation of the prototype on several Rascal-defined languages, including Rascal itself.

Informal milestones

  • Provide a detailed description of the abstract syntax of Rascal's grammar formalism and its semantics
  • Define the abstract syntax of an intermediate grammar formalism to represent regular approximations of context-free grammars, catering for Rascal's lexical disambiguation constructs (e.g., follow restrictions, keyword reservation, etc.)
  • Transform a Rascal grammar to the intermediate representation using the Mohri/Nederhof algorithm.
  • Map the resulting regular approximation to the "Textmate" state-based "mode" data type.
  • Generate concrete JSON/XML/... for specific editors to be able to test the highlighters.
  • Evaluate the approach using various grammars from Rascal's standard library, including Rascal's own grammar, on one or more editors.

Contact: Tijs van der Storm.