简体   繁体   中英

ANTLR ambiguous grammar?

I have couple of ANTLR rules that I don't know how to make them work

The first rule is:

STRING_LITERAL
    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;

The second rule is:

element 
 :  name '='  math_formula  ;
math_formula
        :        '"' expression '"';

The expression is a regular C like expression

Example for the syntax

 "count" = "array[3]"

count shall be a string, while array[3] shall be an expression

My problem is that the lexer always returns both "count" and "array[3]" as Strings, and the Parser cannot recognize the expression.

I'm using java target.

EDIT: changed "variable_name" to "count".

EDIT2: explained my second attempt below:

I can detect the start of expression with '= "', but I won't be able to detect the end of expression in the Lexer, causing false detection of strings when I have 2 elements separated by ','

"count1" = "array[1]",
"count2" = "array[2]"

if I used '= "' as START_EXPRESSION, the lexer detected the quote ending the first expression, and the quote starting the second string as a string ",\\n" which is obviously incorrect.

EDIT 3: Trying Syntactic predicates

I changed the rule for the STRING_LITERAL to

STRING_LITERAL  
    :   (~('=') '"' ( EscapeSequence | ~('\\'|'"') )* '"')=> '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;

Still doesn't work, also I didn't know how to produce the ~('=') in the rule itself by assigning element label to it or somthing

I can't remember the syntax now, because it's been over 10 years, but one of ANTLR's key strengths is arbitrary-length lookahead with backtracking. So, whenever you see a double-quote, lookahead to see if the matches element . If it does, consume the stream as an element ; if not, fall back to the STRING_LITERAL rule.


I delved back into the ANTLR reference guide, and found the syntactic predicate example. Adapting that, I think your rule would look something like this:

protected
STRING : whatever...
;
protected
EXPRESSION: whatever...
;
STRING_OR_EXPR
: ( EXPRESSION ) => EXPRESSION { $setType(EXPRESSION); }
| STRING { $setType(STRING); }
;

It's hard to tell, what the parser effectively receives, given the way it is displayed on this SO web page, and maybe given quotes you added for emphaisis. So pardon this baisc guess, but if ANTLR effectively gets

"variable_name" = "array[3]"

(note the quotes), this would ring as two STRING_LITERAL tokens separated by an equal sign for which it probably doesn't have any rule.

variable_name = "array[3]"

or maybe better

variable_name = array[3]

is what you are trying to do.

EDIT :
After clarifying that name is a STRING (defined elsewhere, no quotes), it its clear that the above guesses are "starting to" be correct. However, another problem is that, unless expression is defined with characters forbidden in a STRING_LITTERAL , math_formula will be ambiguous with it, and hence the lexer won't see an element but a "name '=' STRING_LITERAL" sequence for which it has no rules.

What kind of screwball language are you trying to parse? I'd venture to guess that your best bet is to add some state to your lexer along these lines:

ASSIGN:
    ('=' '"')=> /* assuming whitespace doesn't exist */
     '=' {some_global_flaggy_thing=1;}
    |'='
    ;
STRING_LITERAL:
    {some_global_flaggy_thing==1}? '"' {$type=QUOTE; some_gobal_flaggy_thing=2;}
    |{some_global_flaggy_thing==2}? '"' {$type=QUOTE; some_global_flaggy_thing=0;}
    | '"' /* normal string literal stuff */ '"'
    ;

Of course, your embedded expression can't have string literals in it.
Note I'm more familiar with ANTLR2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM