[英]ANTLR4 parsing subrules
I have a grammar that works fine when parsing in one pass (entire file). 我有一种语法,可以在一次通过(整个文件)中解析时正常工作。
Now I wish to break the parsing up into components. 现在,我希望将解析分解为多个部分。 And run the parser on subrules.
并在子规则上运行解析器。 I ran into an issue I assume others parsing subrules will see with the following rule:
我遇到了一个问题,我认为其他解析子规则将遵循以下规则:
thing : LABEL? THING THINGDATA thingClause?
//{System.out.println("G4 Lexer/parser thing encountered");}
;
...
thingClause : ',' ID ( ',' ID)?
;
When the above rule is parsed from a top level start rule which parses to EOF everything works fine. 当从顶级起始规则(解析为EOF)解析上述规则时,一切正常。 When parsed as a sub-rule (not parse to EOF) the parser gets upset when there is no thing clause, as it is expecting to see EITHER a "," character or an EOF character.
当解析为子规则(不解析为EOF)时,解析器会在没有thing子句的情况下感到沮丧,因为它希望看到的是“,”字符或EOF字符。
line 8:0 mismatched input '%' expecting {, ','} 第8:0行的输入'%'不匹配,期望{,','}
When I parse to EOF, the % gets correctly parsed into another "thing" component, because the top level rule looks for: 当我解析为EOF时,%被正确解析为另一个“事物”组件,因为顶级规则将查找:
toprule : thing+
| endOfThingsTokens
;
And endOfThingsTokens occurs before EOF... so I expect this is why the top level rule works. 并且endOfThingsTokens出现在EOF之前...所以我希望这就是顶级规则起作用的原因。
For parsing the subrule, I want the ANTLR4 parser to accept or ignore the % token and say "OK we aren't seeing a thingClause", then reset the token stream so the next thing object can be parsed by a different instance of the parser. 对于解析子规则,我希望ANTLR4解析器接受或忽略%令牌,然后说“好,我们没有看到thingClause”,然后重置令牌流,以便可以由解析器的另一个实例来解析下一个事物对象。
In this specific case I could change the lexer to pass newlines to the parser, which I currently skip in the lexer grammar. 在这种特定情况下,我可以更改词法分析器,以将换行符传递给解析器,而我目前在词法分析器语法中跳过了这一点。 That would require lots of other changes to accept newlines in the token stream which are currently not needed.
这将需要进行许多其他更改以接受令牌流中当前不需要的换行符。
Essentially I need some way to make the rule have a "end of record" token. 本质上,我需要某种方式使规则具有“记录结束”令牌。 But I was wondering if there was some way to solve this with a semantic predicate rule.
但是我想知道是否有某种方式可以通过语义谓词规则来解决这个问题。
something like: 就像是:
thing : { if comma before %}? LABEL? THING THINGDATA thingClause?
| LABEL? THING THINGDATA
;
...
thingClause : ',' ID ( ',' ID)?
;
The above predicate pseudo code would hide the optional thingClause? 上面的谓词伪代码会隐藏可选的somethingClause吗? if it won't be satisfied so that the parser would stop after parsing one "thing" without looking for a specific "end of thing" token (ie newline).
如果不满意,则解析器将在解析一个“事物”而不寻找特定的“事物结束”令牌(即换行符)后停止。
If I solve this I will post the answer. 如果我解决了这个问题,我会发布答案。
The parser will (effectively) look-ahead in the token stream to determine if the current rule can be satisfied. 解析器将(有效地)提前查看令牌流,以确定是否可以满足当前规则。 The corresponding tokens are then consumed.
然后消耗相应的令牌。 If any look-ahead tokens remain unconsumed, the parser looks for another rule against which to consume these and additional look-ahead tokens.
如果有任何未使用的提前标记,则解析器将寻找另一条规则来消耗这些和额外的提前标记。
The thingClause?
thingClause?
element, when not matched, will result in unconsumed tokens in the parser. 元素,如果不匹配,将导致解析器中的未使用令牌。 Hence the error you are seeing.
因此,您看到的错误。
The parser look-ahead is data dependent. 解析器的前瞻性取决于数据。 Meaning that the evaluation of the elements of a rule can easily read into the parser more tokens than the current rule could possibly consume.
这意味着对规则元素的求值可以轻松地将比当前规则可能消耗的更多的令牌读入解析器。
While a predicate could help, it will not make the problem deterministic. 虽然谓词可以提供帮助,但它不会使问题具有确定性。 That is, even if the parser matches the non-predicated alt, it may well have read more tokens into the parser than can be consumed by that alt.
也就是说,即使解析器与非谓词alt匹配,它也很可能已经读取了比该alt消耗的令牌更多的令牌到解析器中。
The only way to avoid this non-determinism would be to pre-inject <EOF>
tokens into the token stream at the sub-rule boundaries. 避免这种不确定性的唯一方法是在子规则边界将
<EOF>
令牌预注入到令牌流中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.