Home Download Buy Blog Forum Support

Oniguruma syntax definition help.

Oniguruma syntax definition help.

Postby askeeve on Mon May 14, 2012 7:22 pm

Hi guys. I have a bit of a complicated one here. I really wish ST2 used the same regex for syntax files as its search so I could test my regex more easily than writing to the syntax file each time (if anybody knows a better way to test syntax definition matches please let me know).

Anyways... I'm trying to make a slightly complicated match here. In Oniguruma with JSON escaping it is (I believe):
Code: Select all
(?<re>\\{(?:[^()]*+\\k{re})*\\})
although that's not working so well for me so I could be wrong. Using the regex search, I can achieve what I want with this:
Code: Select all
\{(?:[^{}]*+|(?0))*\}
But I can't seem to get a similar result in my syntax definition file. Right now it's only matching empty brackets: {}

While I have your attention, I should probably mention my end-goal to see if there are any alternate solutions I could explore that any of you can think of. My goal is to capture all the text between the first ';' that is not between brackets and the first ":Code" following it. Essentially, detecting orphaned code which the language I'm writing this file for will interpret as a comment. I can't simply match between the ';' and ':Code' because that might either match the ';' in between {}'s:

@foo{bar,blah;};graaa ; asdfl
dsfsh;

:Code

or the last ';':

@foo{bar,blah;};graaa ; asdfl
dsfsh;

:Code

when what I want to match is:

@foo{bar,blah;};graaa ; asdfl
dsfsh;

:Code

My current strategy has been to attempt to match the outer most parens:

@foo{bar,blah;};graaa ; asdfl
dsfsh;
:Code

And then to match from the first ';' which would achieve what I want.
askeeve
 
Posts: 51
Joined: Wed Apr 18, 2012 1:33 pm

Re: Oniguruma syntax definition help.

Postby atomi on Mon May 14, 2012 7:53 pm

I'm not familiar with Oniguruma but you can try using a negative lookahead that will prevent matching an ending curly bracket

begin: ;(?!})
end: (?i:\:code)

Edit: Also check out http://gskinner.com/RegExr/ it's great for testing
Edit2: I guess I am familiar with Oniguruma
Last edited by atomi on Tue May 15, 2012 9:33 pm, edited 2 times in total.
atomi
 
Posts: 342
Joined: Thu Jan 20, 2011 5:06 pm
Location: Los Angeles CA US

Re: Oniguruma syntax definition help.

Postby askeeve on Mon May 14, 2012 8:25 pm

The problem is there could be an ending bracket that I would want to match. This is the match I would want to make in the following:

@foo{asldfkh}; @bar{}

Because after the ';' the @bar{} becomes orphaned code. Using negative look ahead I would not match that ';'. I've been trying a lot. I'm correct that ST2 uses Oniguruma for syntax definition files yes?

Also, does anybody know of a similar tool to the gskinner site that uses Oniguruma? I specifically want to test its syntax as I'm pretty sure that's where I'm running into problems here.
askeeve
 
Posts: 51
Joined: Wed Apr 18, 2012 1:33 pm

Re: Oniguruma syntax definition help.

Postby nick. on Mon May 14, 2012 9:03 pm

You are correct: viewtopic.php?f=3&t=6354
Oniguruma docs: http://www.geocities.jp/kosako3/oniguruma/

It's pretty consistent with other regex engines. The only thing I've missed is the conditional as described in that thread.

I'll give it a shot...

Edit: Unless I'm missing something, this is just a matter of establishing precedence:
Capture.PNG
Capture.PNG (8.04 KiB) Viewed 1293 times

Code: Select all
<key>patterns</key>
<array>
   <dict>
      <key>name</key>
      <string>entity.name.function</string>
      <key>match</key>
      <string>.*?(?={)</string>
   </dict>

   <dict>
      <key>name</key>
      <string>two</string>
      <key>begin</key>
      <string>{</string>
      <key>end</key>
      <string>}</string>
   </dict>

   <dict>
      <key>name</key>
      <string>comment</string>
      <key>begin</key>
      <string>;</string>
      <key>end</key>
      <string>Code</string>
   </dict>
</array>


Ignore the fact that the middle example is treated as a comment from the stray semi-colon on line 5... ;)

Edit 2: Seems I might've missed a detail -- I think you want to include the semi-colon following the closing bracket. I'll have another go.
Not extensively tested:
Code: Select all
<key>patterns</key>
<array>
   <dict>
      <key>name</key>
      <string>two</string>
      <key>begin</key>
      <string>{</string>
      <key>end</key>
      <string>};</string>
   </dict>

   <dict>
      <key>contentName</key>
      <string>comment</string>
      <key>begin</key>
      <string>(?&lt;=};)</string>
      <key>end</key>
      <string>:Code</string>
   </dict>

   <dict>
      <key>name</key>
      <string>entity.name.function</string>
      <key>match</key>
      <string>[^{]*?(?={)</string>
   </dict>
</array>

Capture.PNG
Capture.PNG (10.18 KiB) Viewed 1288 times
nick.
 
Posts: 266
Joined: Wed Jan 18, 2012 3:45 am

Re: Oniguruma syntax definition help.

Postby atomi on Mon May 14, 2012 11:10 pm

Hey I tried!
Nice work @nick; really thorough.

I've been having some memory problems (recuperating from surgery). But, now I definitely remember that post.
Again, very nice job!
Edit: Sorry for the noise :)
Last edited by atomi on Tue May 15, 2012 4:06 pm, edited 1 time in total.
atomi
 
Posts: 342
Joined: Thu Jan 20, 2011 5:06 pm
Location: Los Angeles CA US

Re: Oniguruma syntax definition help.

Postby askeeve on Tue May 15, 2012 2:09 pm

I think you've given me some good things to work with. The language I'm working with is non-standard and allows for some pretty weird stuff so I'm afraid the answer isn't completely there but you've definitely helped me a lot. One further question. Can anybody help me understand how the order of matching is determined and what effect each match has on those below it in ST2 syntax files? My impression is that once a match has been made, it is excluded from all other matches below it. But is this true for the actual regex match or does it just mean that once a scope has been assigned to a match it won't be reassigned by a later match?

A specific example might make this question clearer. I have one or two other matches to pick out specific content inside of braces, and to make these matches I am already matching on the braces. Does this mean that if I try to match on the braces later (even if I'm not assigning them a scope and am only using them as a bound on my actual match), the regex won't find anything because it's already been matched? I know this is true within one regex (which is why zero-width assertions a.k.a. lookaround is so useful).
askeeve
 
Posts: 51
Joined: Wed Apr 18, 2012 1:33 pm

Re: Oniguruma syntax definition help.

Postby askeeve on Tue May 15, 2012 2:17 pm

Also, how would you adapt your {} matching with begin and end to account for nested {}'s? Thank you so much. This was why I was attempting something with a recursive match using Oniguruma's named subexpression.
Code: Select all
(?<re>\\{(?:[^()]*+\\k{re})*\\})
I'm wondering if I was just not capturing it because of precedence with another match in my syntax file.
askeeve
 
Posts: 51
Joined: Wed Apr 18, 2012 1:33 pm

Re: Oniguruma syntax definition help.

Postby nick. on Tue May 15, 2012 3:49 pm

Assume two files, the source code file ("source") and the syntax tmLanguage file ("syntax"). The syntax is always parsed top to bottom. The source is parsed line by line, left to right, top to bottom against the syntax until a match is found. When that happens, the cursor is moved to the end of the match in the source and parsing continues again from the top of the syntax. The important takeaway is that nothing can be matched twice.

See my answer here for dealing with nested braces.

To match content within the braces, use the <patterns> key and an array of <dicts>. If you give a specific example I can write it out for you.
nick.
 
Posts: 266
Joined: Wed Jan 18, 2012 3:45 am

Re: Oniguruma syntax definition help.

Postby askeeve on Tue May 15, 2012 4:36 pm

nick. wrote:The important takeaway is that nothing can be matched twice.


Does this mean for a given pattern or across all paterns? If I have g{A{B}} can I match both the 'A', 'B', and the whole nested {}?

The match I'm trying to make in plain english would be something like: From the first ';' that is not enclosed in any level of nested {}'s until the very next ":Code" without actually capturing the first ; or the :Code.

Here is an example of some especially sticky code I'm trying to handle (note nested {}'s as well as ;'s and {}'s following the first ';' not enclosed in {}'s):

Code: Select all
IF{{O}@Foo(Bar)="Baz"@FB "";
   {O,{O,@FB}@Foo(Bar)}
   @IF{@Foo(Bar)="Baz"@FB "";
       1 @Foo(Bar)@{}@Foo(Bar)}}; This would be orphaned code (comments) regardless of any ; or {}'s (like the preceding) until the next :Code


My end-goal is to match in the above
Code: Select all
This would be orphaned code (comments) regardless of any ; or {}'s (like the preceding) until the next
as a comment.
Note that after a ; can be other ;'s or {}'s that are interpreted as comments.
askeeve
 
Posts: 51
Joined: Wed Apr 18, 2012 1:33 pm

Re: Oniguruma syntax definition help.

Postby atomi on Tue May 15, 2012 7:35 pm

askeeve wrote:Does this mean for a given pattern or across all paterns? If I have g{A{B}} can I match both the 'A', 'B', and the whole nested {}?


I know that It's across one given pattern, BUT you can get around that by matching your syntax within your pattern by nesting another regex pattern within it.
If you don't want to match the starting ; and ending :Code you can use non-capturing groups ie: (?:;)
atomi
 
Posts: 342
Joined: Thu Jan 20, 2011 5:06 pm
Location: Los Angeles CA US

Next

Return to Technical Support

Who is online

Users browsing this forum: Yahoo [Bot] and 8 guests