Sublime Forum

Unicode and regex

#1

The regex \w does not seem to match unicode characters. This manifests in both the find panel, where unicode characters do not match against \w, and in syntax definitions, where function names containing unicode characters are not matched and therefore do not show up in “go to definition” etc

Is this a bug or is my system configured incorrectly?

0 Likes

#2

I don’t know the specification for \w in the regex implementation that Sublime uses, but ECMA-Script regular expressions require \w to match a-z, A-Z, 0-9, and _. No Unicode there.

0 Likes

#3

According to this definition of Oniguruma regular expressions, which is the type that Sublime Text uses, unicode characters should be matched by \w:

geocities.jp/kosako3/oniguruma/doc/RE.txt

0 Likes

#4

I think ST use Perl regexp syntax with Boost except for .tmLanguage:
docs.sublimetext.info/en/latest/ … rview.html

0 Likes

#5

Ah ok, I was mistaken that unicode is not matched by \w in syntax definitions, and this would explain why. I think it would make sense to have a consistent regular expression kind used throughout sublime. At the minute, function names with unicode characters are correctly highlighted and show up in the Go To Definition panel, but are not selected when I hit cmd-D because they are not identified as a “whole word” in the editor.

0 Likes