简体   繁体   中英

Seg fault using flex, yy_scan_string(), and input() until end of input

The following flex code seems to create an executable that seg faults when the input string is an unterminated comment. Note that:

  • this only occurs when the input buffer is a string (eg with yy_scan_string()) rather than a file (yyset_in())
  • the seg fault occurs when the flex code tries to invoke fread() on a NULL file pointer-- even though no file buffer was ever used
  • the "else return 0" prevents the bug from occurring, but I am unclear as to whether this is a safe workaround in the context of multiple buffers
  • the below code is for illustrative purposes only so please forgive the not-so-realistic nature of it.
  • I tried using unput() instead of "return 0" and it did not seem to make any difference

This is a flex bug right (why else would it be calling fread())? In any case, is "return 0" the right way to solve this problem, or could that cause other problems eg for nested / multiple buffers?

bug.l:

// to compile: flex bug.l; gcc -g lex.yy.c
// to run: ./a.out "/* unterminated comment"

%option reentrant stack noyywrap

%%

"/*" {
  size_t len;
  int c1 = 0, c2 = input(yyscanner);

  for(len = 1;;len++) {
    if(c2 == EOF || !c2) break;
    if(c1 == '*' && c2 == '/') break;
    c1 = c2;
    c2 = input(yyscanner);
  }

  if(c1 == '*' && c2 == '/')
    printf("found end of comment\n");
  // else return 0; // without this line, string input of "/*" will cause a seg fault
 }

.|\n

<<EOF>> { return 0; }
%%

int main ( int argc, char * argv[] ) {    
  if(argc < 3 || (strcmp(argv[1], "-s") && strcmp(argv[1], "-f")))
    printf("Usage:\n  %s -s string_to_scan\n  %s -f file_to_scan\n", argv[0], argv[0]);
  else {
    yyscan_t scanner;
    YY_BUFFER_STATE buf;

    if(!strcmp(argv[1], "-s")) {
      yylex_init ( &scanner );
      buf = yy_scan_string(argv[2], scanner);
      yylex ( scanner );
      yylex_destroy ( scanner );
    } else {
      FILE *f = fopen(argv[2], "rb");
      if(f) {
        yylex_init ( &scanner );
        yyset_in (f, scanner);
        yylex ( scanner );
        yylex_destroy ( scanner );
      }
    }
  }
  return 0;
}

As written, you will end up trying to read the EOF twice if the comment is not terminated. That is, you read it once (using input() ), and then, since you don't return from the lexical action, you request flex to continue scanning the input which requires it to read an input character.

That is undefined behaviour. Once you recognize an EOF, you must return 0 from the scanner.

Your <<EOF>> rule is unnecessary; without the rule, the scanner will automatically respond to EOF by returning 0. Normally the only reason to insert an EOF rule is to do some kind of clean-up or to switch input buffers. If you do have a non-trivial EOF rule, it must either return 0 or arrange for the current input buffer to not be retried.

If I understood your code correctly I cannot agree with you that the flex has a bug. You are reading the input stream inside the action code for rule "/*" . Doing this is a violation of the idea of lexical analysis. The code you provided probably prevent the scanner to set the end-of-file flag and after invoking action for "/*" the scanner is trying to read the stream that already has ended. To handle the comments you should be using flex states.

Edit:

Your code should rather look as follows:

%option reentrant stack noyywrap

%s COMMENT

%%

"/*" { BEGIN COMMENT; }
<COMMENT>"*/" { BEGIN 0; }
<COMMENT>. { /*code for ignored comment characters*/ }
.|\n { /*code to handle non-comment characters*/ }

%%

int main ( int argc, char * argv[] ) {
  if(argc < 3 || (strcmp(argv[1], "-s") && strcmp(argv[1], "-f")))
    printf("Usage:\n  %s -s string_to_scan\n  %s -f file_to_scan\n", argv[0], argv[0]);
  else {
    yyscan_t scanner;
    YY_BUFFER_STATE buf;

    if(!strcmp(argv[1], "-s")) {
      yylex_init ( &scanner );
      buf = yy_scan_string(argv[2], scanner);
      yylex ( scanner );
      yylex_destroy ( scanner );
    } else {
      FILE *f = fopen(argv[2], "rb");
      if(f) {
        yylex_init ( &scanner );
        yyset_in (f, scanner);
        yylex ( scanner );
        yylex_destroy ( scanner );
      }
    }
  }
  return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM