简体   繁体   中英

ANTLR4 Token recognition at whitespace

I am new to working with ANTLR parser.

Here is my grammar:

grammar Commands;

file_ : expression EOF;
expression : Command WhiteSpace Shape ;

WhiteSpace : [\t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

I am trying to parse a list of sentences such as:

draw circle;
draw triangle;
delete circle;

I'm getting

token recognition error at:' '

Can anyone tell me what is the problem? PS: I'm working in java 15

UPDATE

file_ : expressions EOF;
expressions 
            : expressions expression
            | expression 
            ;
expression : Command WhiteSpace Shape NewLine ;

WhiteSpace : [\t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

Added support for multiple expressions. I'm getting the same error.

UPDATE

grammar Commands;

file_ : expressions EOF;
expressions
            : expressions expression
            | expression
            ;
expression : Command Shape;

WhiteSpace : [\t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

Even if I don't include WhiteSpace, I get the same token recognition error.

OK, the errors:

line 3:6 token recognition error at: ' '
line 3:13 token recognition error at: ';'

mean that the lexer encountered a white space char (or semi colon), but there is no lexer rule that matches any of these characters. You must include them in your grammar. Let's say you add them like this (note: still incorrect:):

Semi       : ';';
WhiteSpace : [ \t]+ -> skip;

When trying with the rules above, you'd get the error:

line 1:5 missing WhiteSpace at 'circle'

This means the parser cannot match the rule expression: Command WhiteSpace Shape; to the input draw circle; . This is because inside the lexer, you're skip ping all white space characters. This means these tokens will not be available inside a parser rule. Remove them from your parser.

You'll also see the error:

line 1:11 mismatched input ';' expecting <EOF>

which means the input contains a Semi token, and the parser did not expect that. Include the Semi token in your expression rule:

grammar Commands;

file_ : expression EOF;
expression : Command Shape Semi;

Semi : ';';
WhiteSpace : [ \t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

The grammar above will work for single expressions. If you want to match multiple expressions, you could do:

expressions
            : expressions expression
            | expression
            ;

but given that ANTLR generates LL parsers (not LR as the name ANTLR suggests) , it is easier (and makes the parse tree easier to traverse later on) to do this:

expressions
 : expression+
 ;

If you're going to skip all white space chars, you might as well remove the NewLine rule and do this:

WhiteSpace : [ \t\r\n]+ -> skip;

One more thing, the lexer now creates Shape and Command tokens which all have the same type. I'd do something like this instead:

shape    : Square | Triangle | ...;

Square   : 'square';
Triangle : 'triangle';
...

which will make your life easier while traversing the parse tree when you want to evaluate the input (if that is what you're going to do).

I'd go for something like this:

grammar Commands;

file_       : expressions EOF;
expressions : expression+;
expression  : command shape Semi;
shape       : Square | Traingle | Circle | Hexagon | Line;
command     : Fill | Draw | Delete;

Semi        : ';';
WhiteSpace  : [ \t\r\n]+ -> skip;
Square      : 'square';
Traingle    : 'triangle';
Circle      : 'circle';
Hexagon     : 'hexagon';
Line        : 'line';
Fill        : 'fill';
Draw        : 'draw';
Delete      : 'delete';

Your whitespace token rule WhiteSpace only allows for tabs. add a space to it.

WhiteSpace : [ \t]+ -> skip;

(usually, there's more to a whitespace rule than that, but it should solve your immediate problem.

You also haven't accounted for the ';' in your input. Either add it to a rule, or remove from your test input temporarily.

expression : Command Shape ';' ;

This would fix it, but seems like it might not be what you really need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM