简体   繁体   English

使用flex,yy_scan_string()和input()进行段故障,直到输入结束

[英]Seg fault using flex, yy_scan_string(), and input() until end of input

The following flex code seems to create an executable that seg faults when the input string is an unterminated comment. 下面的伸缩代码似乎创建了一个可执行文件,当输入字符串是未终止的注释时,该可执行文件会出现故障。 Note that: 注意:

  • this only occurs when the input buffer is a string (eg with yy_scan_string()) rather than a file (yyset_in()) 仅当输入缓冲区是字符串(例如,使用yy_scan_string())而不是文件(yyset_in())时,才会发生这种情况
  • the seg fault occurs when the flex code tries to invoke fread() on a NULL file pointer-- even though no file buffer was ever used 当flex代码尝试在NULL文件指针上调用fread()时,即使没有使用文件缓冲区,也会发生seg错误
  • the "else return 0" prevents the bug from occurring, but I am unclear as to whether this is a safe workaround in the context of multiple buffers “ else return 0”防止错误发生,但是我不清楚在多个缓冲区的情况下这是否是安全的解决方法
  • the below code is for illustrative purposes only so please forgive the not-so-realistic nature of it. 下面的代码仅用于说明目的,因此请原谅其不太真实的性质。
  • I tried using unput() instead of "return 0" and it did not seem to make any difference 我尝试使用unput()代替“ return 0”,它似乎没有任何区别

This is a flex bug right (why else would it be calling fread())? 这是一个flex错误吧(为什么还要调用fread())? In any case, is "return 0" the right way to solve this problem, or could that cause other problems eg for nested / multiple buffers? 在任何情况下,“返回0”是解决此问题的正确方法,还是会导致其他问题(例如,嵌套/多个缓冲区)?

bug.l: bug.l:

// to compile: flex bug.l; gcc -g lex.yy.c
// to run: ./a.out "/* unterminated comment"

%option reentrant stack noyywrap

%%

"/*" {
  size_t len;
  int c1 = 0, c2 = input(yyscanner);

  for(len = 1;;len++) {
    if(c2 == EOF || !c2) break;
    if(c1 == '*' && c2 == '/') break;
    c1 = c2;
    c2 = input(yyscanner);
  }

  if(c1 == '*' && c2 == '/')
    printf("found end of comment\n");
  // else return 0; // without this line, string input of "/*" will cause a seg fault
 }

.|\n

<<EOF>> { return 0; }
%%

int main ( int argc, char * argv[] ) {    
  if(argc < 3 || (strcmp(argv[1], "-s") && strcmp(argv[1], "-f")))
    printf("Usage:\n  %s -s string_to_scan\n  %s -f file_to_scan\n", argv[0], argv[0]);
  else {
    yyscan_t scanner;
    YY_BUFFER_STATE buf;

    if(!strcmp(argv[1], "-s")) {
      yylex_init ( &scanner );
      buf = yy_scan_string(argv[2], scanner);
      yylex ( scanner );
      yylex_destroy ( scanner );
    } else {
      FILE *f = fopen(argv[2], "rb");
      if(f) {
        yylex_init ( &scanner );
        yyset_in (f, scanner);
        yylex ( scanner );
        yylex_destroy ( scanner );
      }
    }
  }
  return 0;
}

As written, you will end up trying to read the EOF twice if the comment is not terminated. 按照书面说明,如果注释未终止,您将最终尝试阅读EOF两次。 That is, you read it once (using input() ), and then, since you don't return from the lexical action, you request flex to continue scanning the input which requires it to read an input character. 也就是说,您只读取了一次(使用input() ),然后,由于您没有从词法操作中返回,因此您请求flex继续扫描要求其读取输入字符的输入。

That is undefined behaviour. 那是不确定的行为。 Once you recognize an EOF, you must return 0 from the scanner. 识别EOF后,必须从扫描仪返回0。

Your <<EOF>> rule is unnecessary; 您的<<EOF>>规则是不必要的; without the rule, the scanner will automatically respond to EOF by returning 0. Normally the only reason to insert an EOF rule is to do some kind of clean-up or to switch input buffers. 如果没有该规则,则扫描程序将通过返回0自动响应EOF。通常,插入EOF规则的唯一原因是进行某种清理或切换输入缓冲区。 If you do have a non-trivial EOF rule, it must either return 0 or arrange for the current input buffer to not be retried. 如果确实有非平凡的EOF规则,则它必须返回0或安排不重试当前的输入缓冲区。

If I understood your code correctly I cannot agree with you that the flex has a bug. 如果我正确理解了您的代码,则我不同意Flex是否存在错误。 You are reading the input stream inside the action code for rule "/*" . 您正在读取规则"/*"的操作代码内的输入流。 Doing this is a violation of the idea of lexical analysis. 这样做违反了词法分析的思想。 The code you provided probably prevent the scanner to set the end-of-file flag and after invoking action for "/*" the scanner is trying to read the stream that already has ended. 您提供的代码可能会阻止扫描程序设置文件结束标志,并且在对"/*"调用操作后,扫描程序试图读取已经结束的流。 To handle the comments you should be using flex states. 要处理注释,您应该使用flex状态。

Edit: 编辑:

Your code should rather look as follows: 您的代码应如下所示:

%option reentrant stack noyywrap

%s COMMENT

%%

"/*" { BEGIN COMMENT; }
<COMMENT>"*/" { BEGIN 0; }
<COMMENT>. { /*code for ignored comment characters*/ }
.|\n { /*code to handle non-comment characters*/ }

%%

int main ( int argc, char * argv[] ) {
  if(argc < 3 || (strcmp(argv[1], "-s") && strcmp(argv[1], "-f")))
    printf("Usage:\n  %s -s string_to_scan\n  %s -f file_to_scan\n", argv[0], argv[0]);
  else {
    yyscan_t scanner;
    YY_BUFFER_STATE buf;

    if(!strcmp(argv[1], "-s")) {
      yylex_init ( &scanner );
      buf = yy_scan_string(argv[2], scanner);
      yylex ( scanner );
      yylex_destroy ( scanner );
    } else {
      FILE *f = fopen(argv[2], "rb");
      if(f) {
        yylex_init ( &scanner );
        yyset_in (f, scanner);
        yylex ( scanner );
        yylex_destroy ( scanner );
      }
    }
  }
  return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM