Deriving Syntax Highlighters from Context-Free Grammars
Language workbenches provide an integrated experience for developing software languages. Language users are supported by editor services, such as syntax highlighting, jump-to-definition, outlines etc. The goal of this project is to leverage existing editors for providing such services. More specifically, the project is about generating syntax highlighting support based on context-free grammars.
Many editors (e.g., VS Code, Textmate, SublimeText, Atom, ACE, CodeMirror etc.) or highlighters (e.g., Highlight.js, Github) accept state-based language definitions for defining syntax highlighting. In the context of the Rascal language workbench, however, coloring is derived from context-free grammars. How can we derive state-based highlighters from Rascal's context-free grammars?
As a starting point, you'll take the approach detailed in this paper: Mohri, Nederhof, Regular approximation of context-free grammars through transformation, Robustness in language and speech technology, 2001, Springer, [pdf].
Resources on state-based language tokenizers:
- A prototype for transforming grammars into highlighters (in Rascal).
- A precise description of the algorithm, including limitations and trade-offs.
- Evaluation of the prototype on several Rascal-defined languages, including Rascal itself.
- Provide a detailed description of the abstract syntax of Rascal's grammar formalism and its semantics
- Define the abstract syntax of an intermediate grammar formalism to represent regular approximations of context-free grammars, catering for Rascal's lexical disambiguation constructs (e.g., follow restrictions, keyword reservation, etc.)
- Transform a Rascal grammar to the intermediate representation using the Mohri/Nederhof algorithm.
- Map the resulting regular approximation to the "Textmate" state-based "mode" data type.
- Generate concrete JSON/XML/... for specific editors to be able to test the highlighters.
- Evaluate the approach using various grammars from Rascal's standard library, including Rascal's own grammar, on one or more editors.
Contact: Tijs van der Storm.