简体   繁体   English

我简单的antlr语法有什么问题?

[英]What's wrong with my simple antlr grammar?

I am trying to create a very simple antlr grammar file which should parse the following file: 我试图创建一个非常简单的antlr语法文件,它应该解析以下文件:

Report (MyReport)
Begin
End

Or without report name: 或者没有报告名称:

Report
Begin
End

And here is my grammar file: 这是我的语法文件:

grammar RL;

options {
  language = Java;
}

report:
  REPORT ('(' SPACE* STRING_LITERAL SPACE* ')')?
  BEGIN
  END
  ;

REPORT
    :   'Report'
    ;     

BEGIN
    :   'Begin'
    ;

END :   'End';

NAME:   LETTER (LETTER | DIGIT | '_')*;

STRING_LITERAL :    NAME SPACE*;

fragment LETTER: LOWER | UPPER;

fragment LOWER: 'a'..'z';

fragment UPPER: 'A'..'Z';

fragment DIGIT: '0'..'9';

fragment SPACE: ' ' | '\t';

WHITESPACE: SPACE+ { $channel = HIDDEN; };

rule: ;

However when I debug in ANTLRWorks I always get the following error: 但是当我在ANTLRWorks中调试时,我总是会收到以下错误:

 root -> report -> MismatchedTokenException(0!=0)

What's wrong in my Grammar file? 我的语法文件有什么问题?

thanks, Green 谢谢,格林

A couple of remarks: 几句话:

  • Java is the default language, so you can omit language=Java; Java是默认语言,因此您可以省略language=Java; ; ;
  • you're using SPACE inside a parser rule, while this SPACE token is a fragment : this causes the lexer never to create this token: remove it from your parser rule(s); 您在解析器规则中使用SPACE ,而此SPACE令牌是一个fragment :这会导致词法分析器永远不会创建此令牌:从解析器规则中删除它;
  • the input "Report " ("Report" followed by a single white-space) is being tokenized as a STRING_LITERAL , not as a REPORT ! 输入"Report " (“报告”后跟一个空格)被标记为STRING_LITERAL ,而不是REPORT ANTLR's lexer consumes characters greedily, only when two or more rules match the same amount of characters, the rule defined first will get precedence. ANTLR的词法分析器贪婪地使用字符,只有当两个或多个规则匹配相同数量的字符时,首先定义的规则才会优先。 The lexer does not produce tokens that the parser is trying to match (parsing and tokenization are being performed independently!). 词法分析器不会产生解析器尝试匹配标记(正在独立进行解析和标记化!)。

Try the following instead: 请尝试以下方法:

grammar RL;

report
 : REPORT ('(' NAME ')')? BEGIN END
 ;

REPORT : 'Report';     
BEGIN  : 'Begin';
END    : 'End';
NAME   : LETTER (LETTER | DIGIT | '_')*;

fragment LETTER : LOWER | UPPER;
fragment LOWER  : 'a'..'z';
fragment UPPER  : 'A'..'Z';
fragment DIGIT  : '0'..'9';

SPACE  : (' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; };

green wrote: green写道:

What if I want to allow "SPACE" inside Report NAME? 如果我想在报告名称中允许“空格”怎么办?

I would still skip spaces in the lexer. 我仍然会在词法分析器中跳过空格。 Accepting spaces between names but ignoring them in other contexts will result in some clunky rules. 接受名称之间的空格但在其他上下文中忽略它们将导致一些笨重的规则。 Instead of accounting for spaces between a report's name, I would do something like this: 我没有考虑报告名称之间的空格,而是做这样的事情:

report
 : REPORT ('(' report_name ')')? BEGIN END
 ;

report_name
 : NAME+
 ;

resulting in the following parse tree: 导致以下解析树:

在此输入图像描述

for the input: 输入:

Report (a name with spaces)
Begin
End

green wrote: green写道:

so is it possible to allow me use reserved words like 'Report' in the name? 那么是否可以允许我在名称中使用“报告”等保留字?

Sure, explicitly add them in the report_name rule: 当然,在report_name规则中明确添加它们:

report_name
 : (NAME | REPORT | BEGIN | END)+
 ;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM