简体   繁体   English

ANTLR4 空白处的令牌识别

[英]ANTLR4 Token recognition at whitespace

I am new to working with ANTLR parser.我是使用 ANTLR 解析器的新手。

Here is my grammar:这是我的语法:

grammar Commands;

file_ : expression EOF;
expression : Command WhiteSpace Shape ;

WhiteSpace : [\t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

I am trying to parse a list of sentences such as:我正在尝试解析句子列表,例如:

draw circle;
draw triangle;
delete circle;

I'm getting我越来越

token recognition error at:' '

Can anyone tell me what is the problem?谁能告诉我有什么问题? PS: I'm working in java 15 PS:我在 java 15 工作

UPDATE更新

file_ : expressions EOF;
expressions 
            : expressions expression
            | expression 
            ;
expression : Command WhiteSpace Shape NewLine ;

WhiteSpace : [\t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

Added support for multiple expressions.添加了对多个表达式的支持。 I'm getting the same error.我遇到了同样的错误。

UPDATE更新

grammar Commands;

file_ : expressions EOF;
expressions
            : expressions expression
            | expression
            ;
expression : Command Shape;

WhiteSpace : [\t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

Even if I don't include WhiteSpace, I get the same token recognition error.即使我不包括 WhiteSpace,我也会收到相同的令牌识别错误。

OK, the errors:好的,错误:

line 3:6 token recognition error at: ' '
line 3:13 token recognition error at: ';'

mean that the lexer encountered a white space char (or semi colon), but there is no lexer rule that matches any of these characters.表示词法分析器遇到了空白字符(或分号),但没有匹配任何这些字符的词法分析器规则。 You must include them in your grammar.您必须将它们包含在您的语法中。 Let's say you add them like this (note: still incorrect:):假设您像这样添加它们(注意:仍然不正确:):

Semi       : ';';
WhiteSpace : [ \t]+ -> skip;

When trying with the rules above, you'd get the error:尝试使用上述规则时,您会收到错误消息:

line 1:5 missing WhiteSpace at 'circle'

This means the parser cannot match the rule expression: Command WhiteSpace Shape;这意味着解析器无法匹配规则expression: Command WhiteSpace Shape; to the input draw circle;到输入draw circle; . . This is because inside the lexer, you're skip ping all white space characters.这是因为在词法分析器中,您将skip ping 所有空白字符。 This means these tokens will not be available inside a parser rule.这意味着这些标记在解析器规则中不可用。 Remove them from your parser.从解析器中删除它们。

You'll also see the error:您还会看到错误:

line 1:11 mismatched input ';' expecting <EOF>

which means the input contains a Semi token, and the parser did not expect that.这意味着输入包含一个Semi标记,而解析器没有预料到这一点。 Include the Semi token in your expression rule:expression规则中包含Semi标记:

grammar Commands;

file_ : expression EOF;
expression : Command Shape Semi;

Semi : ';';
WhiteSpace : [ \t]+ -> skip;
NewLine : ('\r'?'\n'|'\r') -> skip;
Shape : ('square'|'triangle'|'circle'|'hexagon'|'line');
Command : ('fill'|'draw'|'delete');

The grammar above will work for single expressions.上面的语法适用于单个表达式。 If you want to match multiple expressions, you could do:如果你想匹配多个表达式,你可以这样做:

expressions
            : expressions expression
            | expression
            ;

but given that ANTLR generates LL parsers (not LR as the name ANTLR suggests) , it is easier (and makes the parse tree easier to traverse later on) to do this:但鉴于 ANTLR 生成LL 解析器(不是 ANTLR 名称所暗示的 LR) ,这样做更容易(并使解析树更容易在以后遍历):

expressions
 : expression+
 ;

If you're going to skip all white space chars, you might as well remove the NewLine rule and do this:如果您要跳过所有空白字符,您不妨删除NewLine规则并执行以下操作:

WhiteSpace : [ \t\r\n]+ -> skip;

One more thing, the lexer now creates Shape and Command tokens which all have the same type.还有一件事,词法分析器现在创建了具有相同类型的ShapeCommand标记。 I'd do something like this instead:我会做这样的事情:

shape    : Square | Triangle | ...;

Square   : 'square';
Triangle : 'triangle';
...

which will make your life easier while traversing the parse tree when you want to evaluate the input (if that is what you're going to do).当您想要评估输入时(如果这是您要做的),这将使您在遍历解析树时更轻松。

I'd go for something like this:我想 go 是这样的:

grammar Commands;

file_       : expressions EOF;
expressions : expression+;
expression  : command shape Semi;
shape       : Square | Traingle | Circle | Hexagon | Line;
command     : Fill | Draw | Delete;

Semi        : ';';
WhiteSpace  : [ \t\r\n]+ -> skip;
Square      : 'square';
Traingle    : 'triangle';
Circle      : 'circle';
Hexagon     : 'hexagon';
Line        : 'line';
Fill        : 'fill';
Draw        : 'draw';
Delete      : 'delete';

Your whitespace token rule WhiteSpace only allows for tabs.您的空白标记规则WhiteSpace仅允许制表符。 add a space to it.给它加一个空格。

WhiteSpace : [ \t]+ -> skip;

(usually, there's more to a whitespace rule than that, but it should solve your immediate problem. (通常,空格规则不止于此,但它应该可以解决您的直接问题。

You also haven't accounted for the ';'你也没有考虑到';' in your input.在您的输入中。 Either add it to a rule, or remove from your test input temporarily.要么将其添加到规则中,要么暂时从测试输入中删除。

expression : Command Shape ';' ;

This would fix it, but seems like it might not be what you really need.这可以解决它,但似乎它可能不是你真正需要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM