为什么flex / bison中的多行注释如此回避？

Question

I'm trying to parse C-style multi-line comments in my flex (.l) file: 我正在尝试在我的flex（.l）文件中解析C风格的多行注释：

%s ML_COMMENT
%%

...

<INITIAL>"/*"                   BEGIN(ML_COMMENT);
<ML_COMMENT>"*/"                BEGIN(INITIAL);  
<ML_COMMENT>[.\n]+              { }

I'm not returning any token and my grammar (.y) doesn't address comments in any way. 我没有返回任何令牌，我的语法（.y）也没有以任何方式处理评论。

When I run my executable, I get a parse error: 当我运行我的可执行文件时，我得到一个解析错误：

$ ./a.out
/*
abc 
def
Parse error: parse error
$ echo "/* foo */" | ./a.out
Parse error: parse error

(My yyerror function does a printf("Parse error: %s\\n"), which is where the first half of the redundant error message comes from). （我的yyerror函数执行printf（“解析错误：％s \\ n”），这是冗余错误消息的前半部分来自）。

I can see why the second example fails since the entirety of the input is a comment, and since comments are ignored by the grammar, there are no statements. 我可以看到为什么第二个示例失败，因为整个输入是注释，并且由于语法忽略了注释，因此没有语句。 Thus the input isn't a valid program. 因此输入不是有效的程序。 But the first part throws a parse error before I even finish the comment. 但是在我完成评论之前，第一部分抛出了一个解析错误。

Also confusing: 同样令人困惑：

$ ./a.out
/* foo */
a = b;
Parse error: parse error

In this case, the comment is closed prior to actual valid input (which, without the comment, parses just fine). 在这种情况下，注释在实际有效输入之前关闭（没有注释，解析就好了）。 The failure actually occurs after parsing "a", not after attempting to parse the assignment "a = b;". 解析“a”后实际发生失败，而不是在尝试解析赋值“a = b;”之后。 If I enter "a" on its own line, it still throws an error. 如果我在自己的行上输入“a”，它仍然会抛出错误。

Given that the error message is a parser error and not a scanner error, is there something crucial I'm missing in my .y file? 鉴于错误消息是解析器错误而不是扫描程序错误，我的.y文件中是否存在一些至关重要的内容？ Or am I doing something wrong in my scanner rules that propagates over to the parser side? 或者我在扫描器规则中做错了什么传播到解析器端？

EDIT: Per @Rudi's suggestion, I turned on debugging and found: 编辑： Per @Rudi的建议，我打开调试，发现：

$ ./a.out
Starting parse
Entering state 0
Reading a token: /*
foo
Next token is 44 (IDENTIFER)
Shifting token 44 (IDENTIFER), Entering state 4
Reducing via rule 5 (line 130), IDENTIFER  -> identifier
state stack now 0
Entering state 5

I turned off debugging and found that /* foo */ = bar; 我关闭了调试，发现/* foo */ = bar; indeed parses the same as foo = bar; 确实解析foo = bar; . 。 I'm using flex 2.5.4; 我正在使用flex 2.5.4; it doesn't give me any warnings about the stateful rules I'm attempting to use. 它没有给我任何关于我试图使用的有状态规则的警告。

Answer 1

Parsing comments this way can lead to errors because: 以这种方式解析注释可能会导致错误，因为：

you need to add conditions to all of your lex rules 您需要为所有lex规则添加条件
it becomes even more complex if you also want to handle // comments 如果你还想处理//评论，它会变得更加复杂
you still have the risk that yacc/bison merges two comments including everything in between 你还有风险yacc / bison合并两条评论，包括介于两者之间的所有内容

In my parser, I handle comments like this. 在我的解析器中，我处理这样的评论。 First define lex rules for the start of the comment, like this: 首先为注释的开头定义lex规则，如下所示：

\/\*     {
         if (!SkipComment())
            return(-1);
         }

\/\/     {
         if (!SkipLine())
            return(-1);
         }

then write the SkipComment and SkipLine functions. 然后编写SkipComment和SkipLine函数。 They need to consume all the input until the end of the comment is found (this is rather old code so forgive me the somewhat archaic constructions): 他们需要消耗所有输入，直到找到注释的结尾（这是相当古老的代码，所以请原谅我有些过时的结构）：

bool SkipComment (void)
{
int Key;

Key=!EOF;
while (true)
   {
   if (Key==EOF)
      {
      /* yyerror("Unexpected EOF within comment."); */
      break;
      }
   switch ((char)Key)
      {
      case '*' :
         Key=input();
         if (char)Key=='/') return true;
         else               continue;
         break;
      case '\n' :
         ++LineNr;
         break;
      }
   Key=input();
   }

return false;
}

bool SkipLine (void)
{
int Key;

Key=!EOF;
while (true)
   {
   if (Key==EOF)
      return true;
   switch ((char)Key)
      {
      case '\n' :
         unput('\n');
         return true;
         break;
      }
   Key=input();
   }

return false;
}

Answer 2

I think you need to declare your ML_COMMENT start condition as an exclusive start condition so only the ML_COMMENT rules are active. 我认为您需要将ML_COMMENT启动条件声明为独占启动条件，因此只有ML_COMMENT规则处于活动状态。 %x ML_COMMENT instead of %s ML_COMMENT %x ML_COMMENT而不是%s ML_COMMENT

Otherwise rules with no start conditions are also active. 否则，没有开始条件的规则也是活动的。

Answer 3

I found this description of the C language grammar (actually just the lexer) very useful. 我发现这种C语言语法的描述（实际上只是词法分析器）非常有用。 I think it is mostly the same as Patrick's answer, but slightly different. 我认为它与帕特里克的答案大致相同，但略有不同。

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

Answer 4

Besides the problem with %x vs %s , you also have the problem that the . 除了%x vs %s的问题，你也有问题了. in [.\\n] matches (only) a literal . 在[.\\n]匹配（仅）一个文字. and not 'any character other than newline' like a bare . 而不是“除了换行之外的任何其他角色” . does. 确实。 You want a rule like 你想要一个像这样的规则

<ML_COMMENT>.|"\n"     { /* do nothing */ }

instead 代替

为什么flex / bison中的多行注释如此回避？

问题描述

4 个解决方案

解决方案1
5 2010-11-10 16:38:13

解决方案2
5 已采纳 2010-11-10 17:14:16

解决方案3
1 2013-02-11 23:28:31

解决方案4
1 2010-11-10 18:56:06

为什么flex / bison中的多行注释如此回避？

问题描述

4 个解决方案

解决方案1 5 2010-11-10 16:38:13

解决方案2 5 已采纳 2010-11-10 17:14:16

解决方案3 1 2013-02-11 23:28:31

解决方案4 1 2010-11-10 18:56:06

解决方案1
5 2010-11-10 16:38:13

解决方案2
5 已采纳 2010-11-10 17:14:16

解决方案3
1 2013-02-11 23:28:31

解决方案4
1 2010-11-10 18:56:06