Home Download Buy Blog Forum Support

bug report - javascript regex syntax bug

bug report - javascript regex syntax bug

Postby akira on Tue Mar 06, 2012 11:08 am

in file JavaScript.tmLanguage
<string>(?&lt;=[=(:]|^|return|&amp;&amp;|\|\||!)\s*(/)(?![/*+{}?])</string>

bad case:
Code: Select all
var a = [/\"/,/\"/g];
var a = [/\"/,  /\"/g];


the correct pattern should be
<string>(?&lt;=[\s\[\,=(:]|^|return|&amp;&amp;|\|\||!)\s*(/)(?![/*+{}?])</string>

thanks
akira
 
Posts: 7
Joined: Thu Jan 19, 2012 3:34 am

Re: bug report - javascript regex syntax bug

Postby facelessuser on Wed Mar 07, 2012 9:05 pm

These things seem to get overlooked. This was reported here as well.

viewtopic.php?f=3&t=3696&start=0&hilit=javascript+regex

Though, your solution works much better than what I had suggested; yours covers more scenarios. Thanks. Hopefully issues like this will get picked up and patched, or these languages can be turned over to the community to get fixed up.
facelessuser
 
Posts: 1567
Joined: Tue Apr 05, 2011 7:38 pm

Re: bug report - javascript regex syntax bug

Postby thinkyhead on Tue Apr 17, 2012 1:06 am

Here's a regex that this particular fix doesn't address. If I can figure out a good fix I'll post it here. Meanwhile, maybe one of you can figure it out. The problem is that there's a slash (/) inside of a bracketed character list. I came across it in the jQuery.uni-form.js file.

Code: Select all
/^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/i

I presume an unescaped slash is okay inside of brackets. The JS interpreter doesn't seem to complain.
thinkyhead
 
Posts: 3
Joined: Tue Apr 17, 2012 12:58 am

Re: bug report - javascript regex syntax bug

Postby facelessuser on Tue Apr 17, 2012 2:02 am

This isn't a bug. You need to escape your forward slashes.
facelessuser
 
Posts: 1567
Joined: Tue Apr 05, 2011 7:38 pm

Re: bug report - javascript regex syntax bug

Postby thinkyhead on Tue Apr 17, 2012 5:17 am

And yet the code executes. You can try it yourself in any Javascript console. Is this a case of the Javascript interpreter being generous? Actually, no. This is the correct behavior.

Both of these work just fine in Chrome, Safari, and Firefox:
Code: Select all
/[abc/]/.test('qra');
/[abc\/]/.test('qra');

This is because a character class is itself already "escaped." That's why you don't need to escape parentheses or bar characters either, but you do have to escape the close-bracket ( ] ) character to treat it as part of the character class. The regex parser doesn't care about slashes being delimiters until the character class has exited.

So what is needed is a different set of considerations for the way characters (perhaps just unescaped slashes) are interpreted inside bracketed character classes, and special treatment for \] - so that it doesn't break out of 'character class mode.'
thinkyhead
 
Posts: 3
Joined: Tue Apr 17, 2012 12:58 am

Re: bug report - javascript regex syntax bug

Postby facelessuser on Tue Apr 17, 2012 1:14 pm

Hmm. I guess I always code my regex defensively. I guess it is just habit. If I am between to special characters (double quotes, single quotes, pipes, forward slashes, etc.) I escape all instances of them between the main ones. I think a lot of people do which is why this hasn't been mentioned before now.

If it runs, I guess your right. Maybe I will take a look into it.
facelessuser
 
Posts: 1567
Joined: Tue Apr 05, 2011 7:38 pm

Re: bug report - javascript regex syntax bug

Postby facelessuser on Tue Apr 17, 2012 1:37 pm

This works. Not sure if it has adverse effects, but I don't think it does. Here we are just adding an extra rule between the "/"s to look for matching [] and allow everything between them.

Though this works, I think not explicitly escaping your forward slashes is sloppy. You will notice if you have a linter, it will complain that you haven't escaped them. Reading up, what is actually happening is JS is doing it for you, but if you do it, JS doesn't have to. It is kind of a catch if you are lazy and don't escape them. I still think escaping them yourself is the better way to go instead of using this. This just encourages the user to be continually be lazy with their escaping.

Code: Select all
<dict>
         <key>begin</key>
         <string>(?&lt;=[\s\[\,=(:]|^|return|&amp;&amp;|\|\||!)\s*(/)(?![/*+{}?])</string>
         <key>beginCaptures</key>
         <dict>
            <key>1</key>
            <dict>
               <key>name</key>
               <string>punctuation.definition.string.begin.js</string>
            </dict>
         </dict>
         <key>end</key>
         <string>(/)[igm]*</string>
         <key>endCaptures</key>
         <dict>
            <key>1</key>
            <dict>
               <key>name</key>
               <string>punctuation.definition.string.end.js</string>
            </dict>
         </dict>
         <key>name</key>
         <string>string.regexp.js</string>
         <key>patterns</key>
         <array>
            <dict>
               <key>match</key>
               <string>\\.</string>
               <key>name</key>
               <string>constant.character.escape.js</string>
            </dict>
            <dict>
               <key>begin</key>
               <string>\[</string>
               <key>end</key>
               <string>\]</string>
               <key>name</key>
               <string>string.regexp.js</string>
               <key>patterns</key>
               <array>
                  <dict>
                     <key>match</key>
                     <string>\\.</string>
                     <key>name</key>
                     <string>constant.character.escape.js</string>
                  </dict>
               </array>
            </dict>
         </array>
      </dict>
facelessuser
 
Posts: 1567
Joined: Tue Apr 05, 2011 7:38 pm

Re: bug report - javascript regex syntax bug

Postby thinkyhead on Tue Apr 24, 2012 12:04 am

Thanks! The extra rule for matched brackets works well, and it is smart enough by default to ignore the escaped close-bracket. I don't foresee any adverse effects. As an added 'feature' if there's an un-closed character class in a regex it is now made noticeable by the unusual syntax highlighting. For consistency another rule could be added to count regular parentheses too, though I suppose that might seem obnoxious after a while.
thinkyhead
 
Posts: 3
Joined: Tue Apr 17, 2012 12:58 am

Re: bug report - javascript regex syntax bug

Postby facelessuser on Tue Apr 24, 2012 12:32 am

Glad I could help. You can tweak it any way to suite your needs. The tmLanguage files can seem daunting at first, but once you understand how the rules work, it isn't too bad.
facelessuser
 
Posts: 1567
Joined: Tue Apr 05, 2011 7:38 pm


Return to Technical Support

Who is online

Users browsing this forum: Exabot [Bot] and 11 guests