Software Engineering and Architecture Group (SEARCH) > CS > JBI > FWN > RUG

Deriving Syntax Highlighters from Context-Free Grammars

Language workbenches provide an integrated experience for developing software languages. Language users are supported by editor services, such as syntax highlighting, jump-to-definition, outlines etc. The goal of this project is to leverage existing editors for providing such services. More specifically, the project is about generating syntax highlighting support based on context-free grammars.

Many editors (e.g., VS Code, Textmate, SublimeText, Atom, ACE, CodeMirror etc.) or highlighters (e.g., Highlight.js, Github) accept state-based language definitions for defining syntax highlighting. In the context of the Rascal language workbench, however, coloring is derived from context-free grammars. How can we derive state-based highlighters from Rascal's context-free grammars?

As a starting point, you'll take the approach detailed in this paper: Mohri, Nederhof, Regular approximation of context-free grammars through transformation, Robustness in language and speech technology, 2001, Springer, [pdf].

Resources on state-based language tokenizers:


  • A prototype for transforming grammars into highlighters (in Rascal).
  • A precise description of the algorithm, including limitations and trade-offs.
  • Evaluation of the prototype on several Rascal-defined languages, including Rascal itself.

Contact: Tijs van der Storm.