Sublime Forum

Bug report - javascript regex syntax bug

#1

in file JavaScript.tmLanguage
(?<==(:]|^|return|&&||||!)\s*(/)(?!/*+{}?])

bad case:

var a = /\"/,/\"/g]; 
var a = /\"/,  /\"/g]; 

the correct pattern should be
(?<=\s\,=(:]|^|return|&&||||!)\s*(/)(?!/*+{}?])

thanks

0 Likes

#2

These things seem to get overlooked. This was reported here as well.

viewtopic.php?f=3&t=3696&start=0&hilit=javascript+regex

Though, your solution works much better than what I had suggested; yours covers more scenarios. Thanks. Hopefully issues like this will get picked up and patched, or these languages can be turned over to the community to get fixed up.

0 Likes

#3

Here’s a regex that this particular fix doesn’t address. If I can figure out a good fix I’ll post it here. Meanwhile, maybe one of you can figure it out. The problem is that there’s a slash (/) inside of a bracketed character list. I came across it in the jQuery.uni-form.js file.

/^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/i

I presume an unescaped slash is okay inside of brackets. The JS interpreter doesn’t seem to complain.

0 Likes

#4

This isn’t a bug. You need to escape your forward slashes.

0 Likes

#5

And yet the code executes. You can try it yourself in any Javascript console. Is this a case of the Javascript interpreter being generous? Actually, no. This is the correct behavior.

Both of these work just fine in Chrome, Safari, and Firefox:

/[abc/]/.test('qra'); /[abc\/]/.test('qra');
This is because a character class is itself already “escaped.” That’s why you don’t need to escape parentheses or bar characters either, but you do have to escape the close-bracket ( ] ) character to treat it as part of the character class. The regex parser doesn’t care about slashes being delimiters until the character class has exited.

So what is needed is a different set of considerations for the way characters (perhaps just unescaped slashes) are interpreted inside bracketed character classes, and special treatment for ] - so that it doesn’t break out of ‘character class mode.’

0 Likes

#6

Hmm. I guess I always code my regex defensively. I guess it is just habit. If I am between to special characters (double quotes, single quotes, pipes, forward slashes, etc.) I escape all instances of them between the main ones. I think a lot of people do which is why this hasn’t been mentioned before now.

If it runs, I guess your right. Maybe I will take a look into it.

0 Likes

#7

This works. Not sure if it has adverse effects, but I don’t think it does. Here we are just adding an extra rule between the "/"s to look for matching ] and allow everything between them.

Though this works, I think not explicitly escaping your forward slashes is sloppy. You will notice if you have a linter, it will complain that you haven’t escaped them. Reading up, what is actually happening is JS is doing it for you, but if you do it, JS doesn’t have to. It is kind of a catch if you are lazy and don’t escape them. I still think escaping them yourself is the better way to go instead of using this. This just encourages the user to be continually be lazy with their escaping.

<dict> <key>begin</key> <string>(?&lt;=\s\\,=(:]|^|return|&amp;&amp;|\|\||!)\s*(/)(?!/*+{}?])</string> <key>beginCaptures</key> <dict> <key>1</key> <dict> <key>name</key> <string>punctuation.definition.string.begin.js</string> </dict> </dict> <key>end</key> <string>(/)[igm]*</string> <key>endCaptures</key> <dict> <key>1</key> <dict> <key>name</key> <string>punctuation.definition.string.end.js</string> </dict> </dict> <key>name</key> <string>string.regexp.js</string> <key>patterns</key> <array> <dict> <key>match</key> <string>\\.</string> <key>name</key> <string>constant.character.escape.js</string> </dict> <dict> <key>begin</key> <string>\</string> <key>end</key> <string>\]</string> <key>name</key> <string>string.regexp.js</string> <key>patterns</key> <array> <dict> <key>match</key> <string>\\.</string> <key>name</key> <string>constant.character.escape.js</string> </dict> </array> </dict> </array> </dict>

0 Likes

#8

Thanks! The extra rule for matched brackets works well, and it is smart enough by default to ignore the escaped close-bracket. I don’t foresee any adverse effects. As an added ‘feature’ if there’s an un-closed character class in a regex it is now made noticeable by the unusual syntax highlighting. For consistency another rule could be added to count regular parentheses too, though I suppose that might seem obnoxious after a while.

0 Likes

#9

Glad I could help. You can tweak it any way to suite your needs. The tmLanguage files can seem daunting at first, but once you understand how the rules work, it isn’t too bad.

0 Likes