简体   繁体   中英

Using lex/flex outside of yacc/bison

I am currently utilizing bison and flex to parse linear temporal logic formulas and generate automatons from them. I am using flex in the "default" way, ie writing the token to yylval if necessary and returning the bison token identifier.

I'm using a separate routine to scan the input file. Input consists of identifier names and integer/real numbers, both of which flex can already handle. Example:

x    y
3.20 78.3
3.31 76.2
3.32 77.4
//etc

My question is this: if it is possible, how would I go about utilizing flex to scan the input file? I know how to switch the flex's input buffer, but how would I get the token values after that? If the solution is more complex than just having a separate routine, then I'm not going to bother doing it.

I am using C by the way, but I will need to eventually port this to C++ in the future.

Thanks.

If the problem is as simple as your example indicates, you could just use fscanf . The problem with fscanf will probably not be different from the problem with using your flex-generated scanner, which is that newlines are simply treated as whitespace. If your scanner returns a special token for newlines (or can be convinced to do so, see below), then read on.

If I understand what you want to do correctly, you just need to set up the input correctly and then repeatedly call yylex . If you are using the default non-reentrant API, then it might look something like this:

// Returns a count, and sets the out parameter to a
// malloc'd array of malloc'd strings.
// The free function is left as an exercise. 
int read_headers(char*** out) {
   int count = 0;
   char** id_array = NULL;
   *out = id_array;
   int token;
   while ((token = yylex()) == TOKEN_IDENTIFIER) {
     id_array = realloc(id_array, (count + 1) * sizeof *id_array);
     if (id_array) *out = id_array;
     else
       // Handle error
     id_array[count ++] = strdup(yytext);
   }
   if (token != '\n')
     // Handle error
   return count;
}

// Reads exactly n numbers and stores them in the out parameter,
// which must point at an array of size n.
// Returns: n on success. 0 if EOF reached. -1 on error
int read_n_numbers(double* out, int n) {
  for (int i = 0; i < n; ++i, ++out) {
    int token = yylex();
    if (token != TOKEN_NUMBER) {
      if (i == 0 && token == 0) return 0;
      // Error: non-number or too few numbers
      return -1;
    }
    *out = yylval.number;
  }
  if (yylex() != '\n') {
    // Error: too many numbers
    return -1;
  }
  return 0;
}

It's likely that your scanner does not actually return \\n for a newline. That would rarely be useful in an expression grammar (although sometimes it is). But it's easy to modify the scanner to handle newlines on demand; you simply need to use a start condition. We make it an inclusive start condition because it only needs to handle newlines, but beware that this means that all unmarked rules are also active, so you need to make sure that the unmarked rule which handles newlines also only handles a single newline.

%s RETURN_NEWLINES
%%
<RETURN_NEWLINES>\n { return '\n'; }
\n                  ; // Ignore a single newline
[ \t]+              ; // Ignore any number of horizontal whitespace

With that in place, you can enable newlines by simply calling BEGIN(RETURN_NEWLINES) prior to scanning (and BEGIN(INITIAL) to go back to ignoring them.) You'll need to place functions which enable and disable newline scanning in your flex definition file, because the macros required are not exported.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM