I’ve been looking at .tmLanguage files and reading chapter 12 on “Language Grammars” in the TextMate manual. What I’m preparing for is to create syntax highlighting for SVG and OWL files.
However, in looking for example patterns in the XML.tmLanguage and XSL.tmLanguage files, located in **~\AppData\Roaming\Sublime Text 2\Packages\XML**, I have concluded that the basic syntax definitions in both those files are seriously flawed.
For instance, the first match pattern in the XML file is apparently describing the beginning of the XML declaration as well as processing instructions:
<key>begin</key>
<string>(<\?)\s*(-_a-zA-Z0-9]+)</string>
I translate this regex as “Begin with a left-angle-bracket and a question mark, then a space may optionally appear, followed by at least one upper- or lower-case alphabetic character, a digit, a hyphen or an underscore.”
The following are all allowed by this pattern but are not well-formed XML:
[code]<? XML …
<?-xml ... <?30 ...[/code] Similarly, the match pattern for XML elements is wacky, not only allowing element names to begin with a hyphen but not permitting periods to appear in element names at all. And since **?** [question mark] is not a wildcard but refers to whatever token precedes it, what is **((?:** supposed to mean in the pattern for any XML element name: [code](<)((?:(-_a-zA-Z0-9]+)((:)))?(-_a-zA-Z0-9:]+))(?=(\s^>]*)?></\2>)[/code] Of course, anyone writing XML ought to be relying on an XML parser and not syntax highlighting to get things right. Still, the syntax highlighter needs to get the scope selectors right or the colors won't be right. Being new to plist files and only moderately fluent in regexes, I don't propose to write the definitive XML language grammar for Sublime Text (and Textmate). Nonetheless, I can offer this match pattern in place of the existing one for processing instructions and the XML declaration: [code] begin (<\?)(((X|x)(M|m)(L|l))|(([_a-zA-Z0-9]+)(-_a-zA-Z0-9:\.]*))\s+ captures 1 name markup.other.xml.tag.begin 2 name markup.other.xml.pi [/code] (I chose names that seem to fit within the naming conventions better than the existing ones.) Can anyone here improve this and suggest fixes for other mal-engineered match patterns in XML? Thx, Rgr