简体   繁体   中英

Matching parentheses in ANTLR

I'm new to Antlr and I met one issue with parentheses matching recently. Each node in the parse tree has the form (Node1,W1,W2,Node2) , where Node1 and Node2 are two nodes and W1 and W2 are two weights between them. Given an input file as (1,C,10,2).((2,P,2,3).(3,S,3,2))*.(2,T,2,4) , the parse tree looks wrong, where the operator is not the parent of those nodes and the parentheses are not matched. 在此处输入图像描述

The parse file I wrote is like this:

grammar Semi;

prog
    : expr+
    ;
    
expr
    : expr '*'   
    | expr ('.'|'+') expr 
    | tuple   
    | '(' expr ')' 
    ;

tuple
    : LP NODE W1 W2 NODE RP
    ;

LP  : '(' ;
RP  : ')' ;
W1  : [PCST0];
W2  : [0-9]+;
NODE: [0-9]+;


WS  : [ \t\r\n]+ -> skip ;    // toss out whitespace
COMMA: ',' -> skip;

It seems like expr| '(' expr ')' expr| '(' expr ')' doesn't work correctly. So what should I do to make this parser detects if parentheses belong to the node or not? Update: There are two errors in the command:

line 1:1 no viable alternative at input '(1'
line 1:13 no viable alternative at input '(2'

So it seems like the lexer didn't detect the tuples, but why is that?

Your W2 and NODE rules are the same, so nodes you intend to be NODE are matching W2.

grun with -tokens option: (notice, no NODE tokens)

[@0,0:0='(',<'('>,1:0]
[@1,1:1='1',<W2>,1:1]
[@2,3:3='C',<W1>,1:3]
[@3,5:6='10',<W2>,1:5]
[@4,8:8='2',<W2>,1:8]
[@5,9:9=')',<')'>,1:9]
[@6,10:10='.',<'.'>,1:10]
[@7,11:11='(',<'('>,1:11]
[@8,12:12='(',<'('>,1:12]
[@9,13:13='2',<W2>,1:13]
[@10,15:15='P',<W1>,1:15]
[@11,17:17='2',<W2>,1:17]
[@12,19:19='3',<W2>,1:19]
[@13,20:20=')',<')'>,1:20]
[@14,21:21='.',<'.'>,1:21]
[@15,22:22='(',<'('>,1:22]
[@16,23:23='3',<W2>,1:23]
[@17,25:25='S',<W1>,1:25]
[@18,27:27='3',<W2>,1:27]
[@19,29:29='2',<W2>,1:29]
[@20,30:30=')',<')'>,1:30]
[@21,31:31=')',<')'>,1:31]
[@22,32:32='*',<'*'>,1:32]
[@23,33:33='.',<'.'>,1:33]
[@24,34:34='(',<'('>,1:34]
[@25,35:35='2',<W2>,1:35]
[@26,37:37='T',<W1>,1:37]
[@27,39:39='2',<W2>,1:39]
[@28,41:41='4',<W2>,1:41]
[@29,42:42=')',<')'>,1:42]
[@30,43:42='<EOF>',<EOF>,1:43]

If I replace the NODEs in your parse rule with W2s (sorry, I have no idea what this is supposed to represent), I get:

在此处输入图像描述

It appears that your misconception is that the recursive descent parsing starts with the parser rule and when it encounters a Lexer rule, attempts to match it.

This is not how ANTLR works. With ANTLR, your input is first run through the Lexer (aka Tokenizer) to produce a stream of tokens. This step knows absolutely nothing about your parser rules. (That's why it's so often useful to use grun to dump the stream of tokens, this gives you a picture of what your parser rules are acting upon (and you can see, in your example that there are no NODE tokens, because they all matched W2).

Also, a suggestion... It would appear that commas are an essential part of correct input (unless (1C102).((2P23).(3S32))*.(2T24) is considered valid input. On that assumption, I removed the -> skip and added them to your parser rule (that's why you see them in the parse tree). The resulting grammar I used was:

grammar Semi;

prog: expr+;

expr: expr '*' | expr ('.' | '+') expr | tuple | LP expr RP;

tuple: LP W2 COMMA W1 COMMA W2 COMMA W2 RP;

LP: '(';
RP: ')';
W1: [PCST0];
W2: [0-9]+;
NODE: [0-9]+;

WS: [ \t\r\n]+ -> skip; // toss out whitespace
COMMA: ',';

To take a bit more liberty with your grammar, I'd suggest that your Lexer rules should be raw type focused. And, that you can use labels to make the various elements or your tuple more easily accessible in your code. Here's an example:

grammar Semi;

prog: expr+;

expr: expr '*' | expr ('.' | '+') expr | tuple | LP expr RP;

tuple: LP nodef=INT COMMA w1=PCST0 COMMA w2=INT COMMA nodet=INT RP;

LP: '(';
RP: ')';
PCST0: [PCST0];
INT: [0-9]+;

COMMA: ',';
WS: [ \t\r\n]+ -> skip; // toss out whitespace

With this change, your tuple Context class will have accessors for w1, w1, and node. node will be an array of NBR tokens as I've defined it here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM