简体   繁体   English

如何匹配 ANTLR 解析器(不是词法分析器)中的任何符号?

[英]How to match any symbol in ANTLR parser (not lexer)?

How to match any symbol in ANTLR parser (not lexer)?如何匹配 ANTLR 解析器(不是词法分析器)中的任何符号? Where is the complete language description for ANTLR4 parsers? ANTLR4 解析器的完整语言描述在哪里?

UPDATE更新

Is the answer is "impossible"?答案是“不可能”吗?

You first need to understand the roles of each part in parsing:首先需要了解各个部分在解析中的作用:

The lexer: this is the object that tokenizes your input string.词法分析器:这是标记输入字符串的对象。 Tokenizing means to convert a stream of input characters to an abstract token symbol (usually just a number).标记化意味着将输入字符流转换为抽象标记符号(通常只是一个数字)。

The parser: this is the object that only works with tokens to determine the structure of a language.解析器:这是仅与标记一起工作以确定语言结构的对象。 A language (written as one or more grammar files) defines the token combinations that are valid.一种语言(编写为一个或多个语法文件)定义了有效的标记组合。

As you can see, the parser doesn't even know what a letter is.如您所见,解析器甚至不知道字母是什么。 It only knows tokens.它只知道令牌。 So your question is already wrong.所以你的问题已经错了。

Having said that it would probably help to know why you want to skip individual input letters in your parser.话虽如此,了解为什么要在解析器中跳过单个输入字母可能会有所帮助。 Looks like your base concept needs adjustments.看起来您的基本概念需要调整。

It depends what you mean by "symbol".这取决于你所说的“符号”是什么意思。 To match any token inside a parser rule, use the .要匹配解析器规则中的任何标记,请使用. (DOT) meta char. (点)元字符。 If you're trying to match any character inside a parser rule, then you're out of luck, there is a strict separation between parser- and lexer rules in ANTLR.如果您试图匹配解析器规则中的任何字符,那么您就不走运了,ANTLR 中的解析器规则和词法分析器规则之间有严格的分离。 It is not possible to match any character inside a parser rule.不可能匹配解析器规则中的任何字符。

It is possible, but only if you have such a basic grammar that the reason to use ANTlr is negated anyway.这是可能的,但前提是您拥有如此基本的语法,以至于无论如何都可以否定使用 ANTlr 的理由。

If you had the grammar:如果你有语法:

text     : ANY_CHAR* ;
ANY_CHAR : . ;

it would do what you (seem to) want.它会做你(似乎)想要的。

However, as many have pointed out, this would be a pretty strange thing to do.然而,正如许多人指出的那样,这将是一件非常奇怪的事情。 The purpose of the lexer is to identify different tokens that can be strung together in the parser to form a grammar, so your lexer can either identify the specific string "JSTL/EL" as a token, or [AZ] '/EL', [AZ] '/'[AZ][AZ], etc - depending on what you need.词法分析器的目的是识别可以在解析器中串在一起以形成语法的不同标记,因此您的词法分析器可以将特定字符串“JSTL/EL”识别为标记,或者 [AZ] '/EL', [AZ] '/'[AZ][AZ] 等 - 取决于您的需要。

The parser is then used to define the grammar, so:然后使用解析器定义语法,因此:

phrase     : CHAR* jstl CHAR* ;
jstl       : JSTL SLASH QUALIFIER ;

JSTL       : 'JSTL' ;
SLASH      : '/'
QUALIFIER  : [A-Z][A-Z] ;
CHAR       : . ;

would accept "blah blah JSTL/EL..." as input, but not "blah blah EL/JSTL...".会接受“blah blah JSTL/EL ...”作为输入,但不接受“blah blah EL/JSTL ...”。

I'd recommend looking at The Definitive ANTlr 4 Reference, in particular the section on "Islands in the stream" and the Grammar Reference (Ch 15) that specifically deals with Unicode.我建议查看 The Definitive ANTlr 4 Reference,特别是“流中的岛屿”部分和专门处理 Unicode 的语法参考(第 15 章)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM