Sublime Forum

Syntax Highlighting for non trivial language

#1

Hello!

Currently I’m developing a new language. For the sake of brevity, I have reduced this language to a minimum so that my problem is still the same.

The language has implicit and explicit strings, arrays and objects. An implicit string is just a sequence of characters that does not contain <, {, or ]. An explicit string looks like <saltsalt> where salt is an arbitrary identifier (i.e. [a-zA-Z][a-zA-Z0-9]*) and text is an arbitrary sequence of characters that does not contain the salt.
An array starts with , followed by objects and/or strings and ends with ]. All characters within an array that don’t belong to an array, object or explicit string do belong to an implicit string and the length of each implicit string is maximal and greater than 0.
An object starts with { and ends with } and consists of properties. A property starts with an identifier, followed by a colon, then optional whitespaces and then either an explicit string, array or object.

Sadly, I was not able to define a tmLanguage for this language which highlight the properties and explicit strings, because the explicit strings may contain line breaks, but a regular expression in a tmLanguage definition can only span a single line.
The begin/end-construct does not help either, since I need a backreference to close the string only if the salt is read a second time. Even highlighting properties seems to be problematic (consider “bar:” and “bla:” in {foo:{foo:]} bar:{test:baz}] bla:{}}]), but I think this seems to be possible with tmLanguage.

However, I would prefer to write a custom lexer that splits the text into tokens, as I need to write a lexer anyway. These tokens could be used easily for syntax highlighting. Since I couldn’t find any information for that, it does not seem to be possible…

Does anybody have an idea how I could add syntax highlighting support for this language? Thanks in advance!

0 Likes

#2

Doesn’t perl allow multiline regexes with arbitrary characters to open and close it (after the m or s)? Maybe check out that .tmLanguage? The reason that the tmLanguage format doesn’t allow multiline regexes is that they’re too slow (or at least, they were when TextMate came out in 2004).
As for the custom lexer, Sublime Text doesn’t support anything like that unfortunately. It only supports tmLanguage files.
Are you planning on using Lex for your lexer?

0 Likes

#3

Thanks for your answer! I’ll have a look at perls .tmLanguage. As php has heredoc, that .tmLanguage must solve the same problem.
I’m planning to write the lexer by hand to reduce the dependencies to other libraries and to make porting the parser to other languages easier.
Are there any plans to support custom lexers in Sublime? We are in 2015, nowadays whole IDEs with semantical analysis can be run inside the browser…

0 Likes

#4

Not that I know of, though it’d be awesome.

0 Likes