简体   繁体   中英

Antlr lexer matching unintended rule

I'm re-learning some basic Antlr and trying to write a grammar to generate todo items:

Meeting at 12pm for 20 minutes

The issue I'm having is that three lexer rules in particular are getting "mismatched" depending on the context in which they're used:

HOUR: [0-9]|'1'[0-9]|'2'[0-3];
MINUTE: [0-5][0-9];
NONZERO_NUMBER: [1-9][0-9]*;

There are some cases in which I want 12 to match the HOUR rule, and other times when I want it to match MINUTE , etc., but the parser rules don't seem to be able to influence the lexer to be context-sensitive.

For example, the string above ( Read Books... ) does not parse, because while the 12 is matched as an HOUR , so is the 20 , and the parser is expecting NONZERO_NUMBER so fails.

line 1:20 mismatched input '20' expecting NONZERO_NUMBER

If I change the duration value to intentionally not match the HOUR rule, it's fine:

Meeting at 12pm for 120 minutes // Note 120 minutes doesn't match HOUR or MINUTE

Is there any way to "convince" the lexer to try to match the expected token (as defined for the parser) before trying other/earlier rules?

Here's my full grammar for clarity:

Sidenote: I realize there are other oddities, like an event name can only be a single word, but I'm tackling one problem at a time.

grammar Sprint;

event: eventName timePhrase? durationPhrase?;

durationPhrase: 'for' duration;

timePhrase: 'at' time;

duration: (NONZERO_NUMBER MINUTE_STR) | (NONZERO_NUMBER HOUR_STR);

time: ((HOUR ':' MINUTE) | (HOUR)) AMPM?;

eventName: WORD;

MINUTE_STR: 'minute'('s')?;

HOUR_STR: 'hour'('s')?;

HOUR: [0-9]|'1'[0-9]|'2'[0-3];

MINUTE: [0-5][0-9];

NONZERO_NUMBER: [1-9][0-9]*;

AMPM: ('A'|'a'|'P'|'p')('M'|'m');

WORD: ('a'..'z' | 'A'..'Z')+;

WS: (' '|[\n\t\r]) -> skip;

It's usually a mistake to try to do the work of the parser in the lexer. If the lexer just recognises integers, the parser will have no problem sorting out how to interpret the number. You can reject times like 8:63 in an action or predicate.

Is there any way to "convince" the lexer to try to match the expected token (as defined for the parser) before trying other/earlier rules?

No, you cannot convince lexer to match the expected token, because lexer does not have any expectations (formally, it operates on regular grammar while parser operates on context-free grammar). The lexer and parser operate independently * , you could theoretically run the lexer first without any parser and only then start the parser on the lexer output.


* There is one exception to this in ANTLR 3, I couldn't find whether this is true for ANTLR 4 as well - the ANTLR 3 parser and lexer share a org.antlr.runtime.RecognizerSharedState instance. However using this to affect how lexer matches the tokens would still be risky since you don't have a direct control over when the lexer tokenizes the particular input (ie it can do a lookahead due to some parser rule and tokenize the input before you get to it in parser and attempt to affect it).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM