如何用ANTLR4解析XSD Regex的语法？

Question

Dear Antlr4 community, 尊敬的Antlr4社区，

I recently started to use ANTLR4 to translate regular expression from XSD / xml to cvc4. 我最近开始使用ANTLR4将正则表达式从XSD / xml转换为cvc4。 I use the grammar as specified by w3c, see http://www.w3.org/TR/xmlschema11-2/#regexs . 我使用w3c指定的语法，请参阅http://www.w3.org/TR/xmlschema11-2/#regexs 。 For this question I have simplified this grammar (by removing charClass) to: 对于这个问题，我已将此语法（通过删除charClass）简化为：

grammar XSDRegExp;

regExp            :     branch ( '|' branch )* ;
branch            :     piece* ;
piece             :     atom quantifier? ;
quantifier        :     Quantifiers | '{'quantity'}' ;
quantity          :     quantRange | quantMin | QuantExact ;
quantRange        :     QuantExact ',' QuantExact ;
quantMin          :     QuantExact ',' ;
atom              :     NormalChar | '(' regExp ')' ;       // excluded | charClass  ;

QuantExact        :     [0-9]+ ;
NormalChar        :     ~[.\\?*+{}()|\[\]] ;        
Quantifiers       :     [?*+] ;

Parsing seems to go fine: 解析似乎很好：

input    a(bd){6,7}c{14,15}

However, I get an error message for: 但是，我收到以下错误消息：

input    12{3,4}

The error is: 错误是：

line 1:0 mismatched input '12' expecting {, '(', '|', NormalChar} 第1：0行的输入'12'不匹配，期望{，'（'，'|'，NormalChar}

I understand that the Lexer could also see a QuantExact as the first symbol, but since the Parser is only looking for a NormalChar I did not expect this error. 我知道Lexer也可以将QuantExact视为第一个符号，但是由于解析器仅在寻找NormalChar，所以我没想到会出现此错误。

I tried a number of changes: 我尝试了一些更改：

[1] Swapping the definitions of QuantExact and NormalChar. [1]交换QuantExact和NormalChar的定义。 But swapping introduces an error in the first input: 但是交换会在第一个输入中引入一个错误：

line 1:6 no viable alternative at input '6'

since in that case '6' is only seen as a NormalChar and NOT as a QuantExact. 因为在那种情况下，“ 6”仅被视为NormalChar，而不是QuantExact。

[2] Try to make a context for QuantExact (the curly brackets of quantity), such that the lexer only provides the QuantExact symbols in this limited context. [2]尝试为QuantExact（数量的大括号）创建上下文，以便词法分析器仅在此受限上下文中提供QuantExact符号。 But I failed to find ANTLR4 primitives for this. 但是我没有为此找到ANTLR4原语。

So nothing seems to work, therefore my question is: Can I parse this grammar with ANTLR4? 因此似乎没有任何效果，因此我的问题是： 我可以使用ANTLR4解析此语法吗？ And if so, how? 如果是这样，怎么办？

Answer 1

I understand that the Lexer could also see a QuantExact as the first symbol, but since the Parser is only looking for a NormalChar I did not expect this error. 我知道Lexer也可以将QuantExact视为第一个符号，但是由于解析器仅在寻找NormalChar，所以我没想到会出现此错误。

The lexer does not "listen" to the parser: no matter if the parser is trying to match a NormalChar , the characters 12 will always be matched as a QuantExact . 词法分析器不会“监听”解析器：无论解析器是否尝试匹配NormalChar ，字符12始终将匹配为QuantExact 。 The lexer tries to match as much characters as possible, and in case of a tie, it chooses the rule defined first. 词法分析器尝试匹配尽可能多的字符，如果出现平局，它将选择首先定义的规则。

You could introduce a normalChar rule that matches both a NormalChar and QuantExact and use that rule in your atom : 您可以引入同时匹配NormalChar和QuantExact的normalChar规则，并在您的atom使用该规则：

atom              :     normalChar | '(' regExp ')' ;
normalChar        :     NormalChar | QuantExact ;

Another option would be to let the lexer create single char tokens only, and let the parser glue these together (much like a PEG ). 另一个选择是让词法分析器仅创建单个char令牌，然后让解析器将这些令牌粘合在一起（很像PEG ）。 Something like this: 像这样：

regExp            :     branch ( '|' branch )* ;
branch            :     piece* ;
piece             :     atom quantifier? ;
quantifier        :     Quantifiers | '{'quantity'}' ;
quantity          :     quantRange | quantMin | quantExact ;
quantRange        :     quantExact ',' quantExact ;
quantMin          :     quantExact ',' ;
atom              :     normalChar | '(' regExp ')' ; 
normalChar        :     NormalChar | Digit ;
quantExact        :     Digit+ ;

Digit             :     [0-9] ;
NormalChar        :     ~[.\\?*+{}()|\[\]] ;
Quantifiers       :     [?*+] ;

如何用ANTLR4解析XSD Regex的语法？

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-06-13 18:01:06

如何用ANTLR4解析XSD Regex的语法？

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-06-13 18:01:06

解决方案1
0 已采纳 2014-06-13 18:01:06