简体   繁体   中英

ANTLR4 grammar not behaving as expected

I have some data required to be parsed. I am using ANTLR4 tool to auto generate java parsers and lexers, that I can use to form a structured data from the input data given below Grammar:

grammar SUBDATA;
subdata:
    data+;
data:
    array;
array:
    '[' obj (',' obj)* ']';
intarray:
    '[' number (',' number)* ']';
number:
    INT;
obj:
    '{' pair (',' pair)* '}';
pair:
    key '=' value;
key:
    WORD;
value:
    INT | WORD | intarray;
WORD:
    [A-Za-z0-9]+;
INT:
    [0-9]+;
WS:
    [ \t\n\r]+ -> skip;

Test Input Data:

[
    {OmedaDemographicType=1, OmedaDemographicId=100, OmedaDemographicValue=4}, 
    {OmedaDemographicType=1, OmedaDemographicId=101, OmedaDemographicValue=26}, 
    {
        OmedaDemographicType=2, OmedaDemographicId=102, OmedaDemographicValue=[16,34]
    }
]

Ouput:

line 5:79 mismatched input '16' expecting INT
line 5:82 mismatched input '34' expecting INT

GUI树O / P

Parser is failing although I have the integer value at the above expected position.

You've made the classic mistake of not ordering your lexer rules properly. You should read and understand the priority rules and their consequences.

In your case, INT will never be able to match since the WORD rule can match everything the INT rule can, and it's defined first in the grammar. These 16 and 32 from the example are WORD s.

You should remove the ambiguity by not allowing a word to start with a digit:

WORD:
    [A-Za-z] [A-Za-z0-9]*;
INT:
    [0-9]+;

Or by swapping the order of the rules:

INT:
    [0-9]+;
WORD:
    [A-Za-z0-9]+;

In this case, you can't have words that are fully numeric, but they will still be able to start with a number.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM