ANTLR grammar not working as expected. What am I doing wrong?

Question

I have this grammar below for implementing an IN operator taking a list of numbers or strings.

grammar listFilterExpr;

listFilterExpr: entityIdNumberListFilter | entityIdStringListFilter;

entityIdNumberProperty
    : 'a.Id'
    | 'c.Id'
    | 'e.Id'
    ;
    
entityIdStringProperty
    : 'f.phone'
    ;

listFilterExpr
    : entityIdNumberListFilter
    | entityIdStringListFilter
    ;

listOperator
    : '$in:'
    ;

entityIdNumberListFilter
 :  entityIdNumberProperty listOperator numberList
 ;

 entityIdStringListFilter
 : entityIdStringProperty listOperator stringList
 ;

 numberList: '[' ID (',' ID)* ']';

 fragment ID: [1-9][0-9]*;

 stringList: '[' STRING (',' STRING)* ']';
 
 STRING
: '"'(ESC | SAFECODEPOINT)*'"'
;

fragment ESC
   : '\\' (["\\/bfnrt] | UNICODE)
   ;
   
fragment SAFECODEPOINT
   : ~ ["\\\u0000-\u001F]
   ;

If I try to parse the following input:

c.Id $in: [1,1]

Then I get the following error in the parser:

mismatched input '1' expecting ID

Please help me to correct this grammar.

Update

I found this following rule way above in the huge grammar file of my project that might be matching '1' before it gets to match to ID :

NUMBER
   : '-'? INT ('.' [0-9] +)?
   ;
fragment INT
   : '0' | [1-9] [0-9]*
   ;

But, If I write my ID rule before NUMBER then other things fail, because they have already matched ID which should have matched NUMBER

What should I do?

Answer 1

As mentioned by rici: ID should not be a fragment . Fragments can only be used by other lexer rules, they will never become a token on their own (and can therefor not be used in parser rules).

Just remove the fragment keyword from it: ID: [1-9][0-9]*;

Note that you'll also have to account for spaces. You probably want to skip them:

SPACES : [ \t\r\n] -> skip;

... mismatched input '1' expecting ID ...

This looks like there's another lexer, besides ID , that also matches the input 1 and is defined before ID . In that case, have a look at this Q&A: ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

EDIT

Because you have the rules ordered like this:

NUMBER
   : '-'? INT ('.' [0-9] +)?
   ;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

ID
   : [1-9][0-9]*
   ;

the lexer will never create an ID token (only NUMBER tokens will be created). This is just how ANTLR works: in case of 2 or more lexer rules match the same amount of characters, the one defined first "wins".

In the first place I think it's odd to have an ID rule that matches only digits, but, if that's the language you're parsing, OK. In your case, you could do something like this:

id     : POS_NUMBER;
number : POS_NUMBER | NEG_NUMBER;

POS_NUMBER : INT ('.' [0-9] +)?;
NEG_NUMBER : '-' POS_NUMBER;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

and then instead of ID , use id in your parser rules. As well as using number instead of the NUMBER you're using now.

ANTLR grammar not working as expected. What am I doing wrong?

Question

1 answers

solution1
2 ACCPTED 2020-10-13 11:54:57

EDIT

ANTLR grammar not working as expected. What am I doing wrong?

Question

1 answers

solution1 2 ACCPTED 2020-10-13 11:54:57

EDIT

solution1
2 ACCPTED 2020-10-13 11:54:57