简体   繁体   English

了解ANTLR4中的词法分析器规则解析

[英]Understanding lexer rule resolution in ANTLR4

I'm reading the ANTLR4 defenitive guide and now I'm at the section about lexer rule resolution. 我正在阅读ANTLR4防御指南,现在在有关词法分析器规则解析的部分中。 Here is what's written in this section: 这是本节中写的:

grammar KeywordTest;
enumDef : 'enum' '{' ... '}';
...
FOR: 'for'
...
ID:[a-zA-Z]; // does not match 'enum' or 'for'

Rule ID could also match kewords such as enum or for , which means there's more than one rule that could match the same string. 规则ID也可以匹配诸如enumfor关键词,这意味着不止一个规则可以匹配同一字符串。 [...] Literals such as 'enum' become lexical rules and go immediately after the parser rules but before the explicit lexical rules. 诸如'enum'文字成为词汇规则,并 在解析器规则之后但在显式词汇 规则 之前

What does it mean and how does it help us to resolve the potential ambiguities? 这是什么意思?它如何帮助我们解决潜在的歧义? I would say that a declaration like 我会说像

ENUM_KEYWORD: 'enum'

which ATNLR4 might use internally would be decalred right after the rule enumDef: 'enum' '{' ... '} and will look as follows: 在规则enumDef: 'enum' '{' ... '}之后, ATNLR4可能在内部使用的对象将被自动enumDef: 'enum' '{' ... '} ,其外观如下:

enumDef: ENUM_KEYWORD '{' ... '}
ENUM_KEYWORD: 'enum'

Is that exactly how ANTLR4 does things? ANTLR4到底是怎么做的?

Order of lexer rules is very important in grammar, as the first applicable rule found will be used. 词法分析器规则的顺序在语法中非常重要,因为将使用找到的第一个适用规则。 You can read more here . 您可以在这里阅读更多内容。

So if you have lexer rules: 因此,如果您有词法分析器规则:

ID: [a-zA-Z]+;
FOR: 'for';

based on its order input "for" will be marked as FOR token or as ID token, because for both it is correct. 根据其订单输入,“ for”将被标记为FOR令牌或ID令牌,因为对于两者而言都是正确的。

As a result, grammars very often contains rule 'ambigous' where all keywords are mentioned so when another token contains keyword it would pass. 结果,语法经常包含规则“模糊”,其中提到了所有关键字,因此当另一个标记包含关键字时,它将通过。

For example: 例如:

alfaNum: (ALFA | NUM | ambigous | '_' )+?;
ambigous: SELECT | WHERE | FROM | WITH | SET | AS;

this way if there is alfaNum token "selection", it would pass. 这样,如果存在alfaNum令牌“选择”,它将通过。 If ambigous would not be specified, it would fail over lexer rule SELECT: 'select'; 如果不指定歧义,它将对词法分析器规则SELECT: 'select';故障转移SELECT: 'select';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM