简体   繁体   中英

How to match any symbol in ANTLR parser (not lexer)?

How to match any symbol in ANTLR parser (not lexer)? Where is the complete language description for ANTLR4 parsers?

UPDATE

Is the answer is "impossible"?

You first need to understand the roles of each part in parsing:

The lexer: this is the object that tokenizes your input string. Tokenizing means to convert a stream of input characters to an abstract token symbol (usually just a number).

The parser: this is the object that only works with tokens to determine the structure of a language. A language (written as one or more grammar files) defines the token combinations that are valid.

As you can see, the parser doesn't even know what a letter is. It only knows tokens. So your question is already wrong.

Having said that it would probably help to know why you want to skip individual input letters in your parser. Looks like your base concept needs adjustments.

It depends what you mean by "symbol". To match any token inside a parser rule, use the . (DOT) meta char. If you're trying to match any character inside a parser rule, then you're out of luck, there is a strict separation between parser- and lexer rules in ANTLR. It is not possible to match any character inside a parser rule.

It is possible, but only if you have such a basic grammar that the reason to use ANTlr is negated anyway.

If you had the grammar:

text     : ANY_CHAR* ;
ANY_CHAR : . ;

it would do what you (seem to) want.

However, as many have pointed out, this would be a pretty strange thing to do. The purpose of the lexer is to identify different tokens that can be strung together in the parser to form a grammar, so your lexer can either identify the specific string "JSTL/EL" as a token, or [AZ] '/EL', [AZ] '/'[AZ][AZ], etc - depending on what you need.

The parser is then used to define the grammar, so:

phrase     : CHAR* jstl CHAR* ;
jstl       : JSTL SLASH QUALIFIER ;

JSTL       : 'JSTL' ;
SLASH      : '/'
QUALIFIER  : [A-Z][A-Z] ;
CHAR       : . ;

would accept "blah blah JSTL/EL..." as input, but not "blah blah EL/JSTL...".

I'd recommend looking at The Definitive ANTlr 4 Reference, in particular the section on "Islands in the stream" and the Grammar Reference (Ch 15) that specifically deals with Unicode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM