简体   繁体   English

如何编写语法来捕获内联注释而忽略仅带注释的行?

[英]How to write a grammar to capture inline comment while ignoring lines with just comments?

I am writing a grammar to a new language I'm developing. 我正在为我正在开发的新语言编写语法。 The language has the below definition for comments: 该语言具有以下评论定义:

  1. A comment can be either "inline" or "only-line" comment 评论可以是“内联”或“仅行”评论
  2. "inline" comments starts with # “inline”评论以#开头
  3. "only-line" comments starts with either # or * “only-line”评论以#*开头
  4. Every language statements ends with newline 每个语言语句都以newline
  5. "only-line" comments can be ignored “only-line”评论可以忽略
  6. "inline" comments should be processed (value passed to the tree walker in the code generator phase) 应该处理“内联”注释(在代码生成器阶段将值传递给树步行者)

Example: 例:

keyword(0x12, 0x12) # this is an inline comment
keyword(0x34, 0x34) # this is another inline comment

# this is an "only-line" comment
* this is another "only-line" comment
keyword(0x55, 0x55) # this is the 3rd inline comment

Here is my (reduced) grammar to achieve this goal: 这是我的(简化)语法来实现这个目标:

statement :   empty_line
          |   comment_statement
          |   keyword_statement
          ;

keyword_statement : 'keyword' '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;

in_line_comment : IN_LINE_COMMENT;

comment_statement : LINE_COMMENT;
empty_line        : NL;

IN_LINE_COMMENT : '#' ~[\r\n]* ;
LINE_COMMENT    : [#*] ~[\r\n]* -> skip;

HEX_VALUE       : '0x' [0-9a-fA-F]+;

NL              : '\r'? '\n' -> channel(2);
WS              : [ \t]+ -> skip;

Compiling Antlr4 and feeding the example text into the grammar yields: 编译Antlr4并将示例文本提供给语法产生:

[@0,0:6='keyword',<'keyword'>,1:0]
[@1,7:7='(',<'('>,1:7]
[@2,8:11='0x12',<HEX_VALUE>,1:8]
[@3,12:12=',',<','>,1:12]
[@4,14:17='0x12',<HEX_VALUE>,1:14]
[@5,18:18=')',<')'>,1:18]
[@6,20:46='# this is an inline comment',<IN_LINE_COMMENT>,1:20]
[@7,47:47='\n',<NL>,channel=2,1:47]
[@8,48:54='keyword',<'keyword'>,2:0]
[@9,55:55='(',<'('>,2:7]
[@10,56:59='0x34',<HEX_VALUE>,2:8]
[@11,60:60=',',<','>,2:12]
[@12,62:65='0x34',<HEX_VALUE>,2:14]
[@13,66:66=')',<')'>,2:18]
[@14,68:99='# this is another inline comment',<IN_LINE_COMMENT>,2:20]
[@15,100:100='\n',<NL>,channel=2,2:52]
[@16,101:101='\n',<NL>,channel=2,3:0]
[@17,102:133='# this is an "only-line" comment',<IN_LINE_COMMENT>,4:0]
[@18,134:134='\n',<NL>,channel=2,4:32]
[@19,172:172='\n',<NL>,channel=2,5:37]
[@20,173:179='keyword',<'keyword'>,6:0]
[@21,180:180='(',<'('>,6:7]
[@22,181:184='0x55',<HEX_VALUE>,6:8]
[@23,185:185=',',<','>,6:12]
[@24,187:190='0x55',<HEX_VALUE>,6:14]
[@25,191:191=')',<')'>,6:18]
[@26,193:224='# this is the 3rd inline comment',<IN_LINE_COMMENT>,6:20]
[@27,225:225='\n',<NL>,channel=2,6:52]
[@28,226:225='<EOF>',<EOF>,7:0]
line 4:0 extraneous input '# this is an "only-line" comment' expecting {<EOF>, 'keyword', LINE_COMMENT, NL}

which means the "only-line" comment that starts with # is identified as LINE_COMMENT token which is wrong. 这意味着以#开头的“only-line”注释被标识为LINE_COMMENT标记,这是错误的。

How can I instruct the grammar to treat that comment differently? 我如何指导语法以不同方式处理该评论?

Ok. 好。 digging this myself and as a community service.. 自己挖掘这个并作为社区服务..

here is my solution. 这是我的解决方案。 I used semantic predicate in the grammar to solve the problem. 我在语法中使用语义谓词来解决问题。 The solution is currently using Java implementation (just to eliminate the complexity of Antlr4 Python) - but i will sure translate the below to python 解决方案目前正在使用Java实现(只是为了消除Antlr4 Python的复杂性) - 但我肯定会将下面的内容翻译成python

My modified grammar: 我修改过的语法:

@lexer::members {
    int in_line = 0;                                       <-- initialize to "only-line"
}

prog      : statement+ EOF;

statement :   empty_line
          |   comment_statement
          |   keyword_statement
          ;

keyword_statement : KEYWORD '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;

in_line_comment : IN_LINE_COMMENT;

comment_statement : LINE_COMMENT;
empty_line        : NL;

KEYWORD         : 'keyword' {in_line = 1;};

IN_LINE_COMMENT : '#' ~[\r\n]* {in_line == 1}?;            <-- will match this token only if in_line == 1 in run-time
LINE_COMMENT    : [#*] ~[\r\n]* -> skip;

HEX_VALUE       : '0x' [0-9a-fA-F]+;

NL              : '\r'? '\n' {in_line = 0;}-> channel(2);  <-- reset in_line to 0 after every statement
WS              : [ \t]+ -> skip;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 阅读行时忽略注释(特定字符后的字符) - Ignoring comments (characters after a certain character) when reading lines 如果这句话分成很多行,如何在一行中写文本后面的评论? - How to write comment behind text in one line if this sentence is divided into many lines? 如何使用 Python 在特定范围内逐行读取 txt 文件,同时忽略空行? - How to use Python to read a txt file line by line within a particular range while ignoring empty lines? 有没有办法让 gettext 注释提取器查找字符串的内联注释而不是之前的行? - Is there a way to make gettext comment extractor look for the inline comments of a string rather than the line before? 如何在 Python3 中将代码行转换为注释 - How to transform lines of code into a comment in Python3 带换行符的嵌入式注释 - Inline comment with line break 如何在 function 中编写条件以发表此评论 Python - How to write a condition in a function to make this comment Python 如何编写 Inline/Lambda 以追加或不追加到列表 - How to write Inline/Lambda for appending or not appending to a list Pypars 简单算术语法不捕获表达式 - Pyparsing simple arithmetic grammar does not capture the expression 如何编写程序刚刚生成的python文件? - How to write a python file just generated by the program?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM